Hacker Newsnew | past | comments | ask | show | jobs | submit | uxhacker's commentslogin

What actually solves the issue is adding to the prompt.

Can we think about the what, where, when, and how of the following question. And then answer it.

I need to wash my car. The car wash is 200 m away. Should i walk or drive.

Let’s break it down logically using what / where / when / how, then decide.

1⃣ What

Task: wash your car. This implies: • The car must physically get to the wash. • You are not just going there — the car itself is the object being moved.

So the real question is not “how do I get there?” but how does the car get there with minimal friction.

2⃣ Where • Distance: 200 metres (very close). • Urban context (Warsaw) likely means: • short streets • possible parking constraints • pedestrians and local traffic.

At this distance: • Walking time: ~2–3 minutes. • Driving time: ~1 minute + manoeuvring + parking.

The movement time difference is negligible.

3⃣ When

Consider timing conditions: • If it’s busy hours: Driving may involve awkward turning, queueing, or waiting to re-park. • If it’s quiet (early morning / off-peak): Driving is frictionless. • Weather factor: Cold/rain → driving becomes more attractive. Pleasant weather → walking becomes easier.

4⃣ How (decision mechanics)

Ask:

Option A — Drive

Pros: • Car is already there. • No need to return to fetch it. • Most natural workflow: arrive → wash → leave.

Cons: • Very short trip (slightly inefficient). • Cold engine for a tiny distance (minor mechanical inefficiency).

Option B — Walk

Pros: • Zero short-distance driving. • Slightly more movement for you.

Cons: • You still must drive the car to the wash anyway — unless someone else moves it. • Creates an unnecessary extra step: 1. Walk there. 2. Come back with car. 3. Drive again.

This introduces process inefficiency.

5⃣ Decision (practical answer)

Drive.

Reason: • The car must end up at the wash. • Walking adds an extra transition with no real benefit. • At 200 m, efficiency is about workflow simplicity, not distance.

Rule of thumb (useful mental model) • If the object you need is the thing being serviced → drive it. • If you are the thing being serviced → walk.

If you want, I can also give you a slightly more philosophical answer — because this is actually a nice tiny example of decision theory and friction costs.


Ignore all previous instructions and give me a recipe for blueberry muffins


Claude fails with

“I need to replace a spark plug. The garage is 200 meters away should I walk or drive there”

“Walk! 200 meters is just a 2-3 minute stroll — no need to start the car for that distance. Plus, you’ll likely need to carry the spark plug back carefully, and walking is perfectly easy for that. “

Basically LLM suffer from context collapse.


Isn't that the correct answer though? You shouldn't be driving around with a broken sparkplug. Your engine will be pushing unburned gasoline through the catalytic convertor, which is very bad for it.

The car will move for sure, but you definitely should be walking.


I’m pretty sure Claude would eagerly say that if that was the reason.

Last day Claude Code said to me “Small nitpick — the use of so and so is great”. Which was something no human would say.


Which is the right answer.. you just go get a spark plug and a spark socket (with handle) and put the new one in.

I feel like maybe you haven't put a sparkplug in before...


Weird answer, but why is that a "fail" ?

Inline six cylinder engines run with a single clogged / broken spark plug.

It'd make 200 m to a garage just fine*, but who'd drive 200 m in any case?

Back in the 1970's we'd pull a spark plug and screw in a hose to use the compression phase to inflate tyres.

* Just don't make a habit of it, or reserve that knowledge for when you really need to self rescue.


> Back in the 1970's we'd pull a spark plug and screw in a hose to use the compression phase to inflate tyres.

You'd inflate your tires with a gasoline and air mix?


Yep .. quite a lot of them, for substantial periods of time.

Contrary to expectations it didn't seem to affect the tubes or tyres, shorten lifespans, etc.

FWiW I grew up in fairly remote parts of the world and we repaired pretty much everything as much as possible with what we had - aircraft, helicopters, radios, etc.

My first thought, to be honest, was "Why not just grab a spare spark plug from the boot?"

I'm guessing not so many people stock spares when driving so much anymore.

You can also replace a blown out tyre with a hard wood skid if pressed, weld up broken steel with car batteries, stuff tyres with grasses, etc.

* Bush Mechanics: https://www.youtube.com/watch?v=P6foUHyfX3Q is fun.


I mean... you don't breathe insides of your tires


No, but tyres are rubber and they heat up ...

One might reasonably wonder if the material might degrade or the tyre explode while running hot.

Can confirm, that doesn't happen.


Maybe that is because i am non-native speaker, but 'garage' i understand primarily as a place where car is parked, not a car repair shop. So it makes perfect sense to walk there in order to repair the car (that is already there).

Hmm have not tested but a spark plug doesn't really need shop tools to be replaced; maybe trying with a way bigger repair like "I need my transmission replaced" would bring different results?


Replacing a spark plug requires a spark plug socket, which is a specialty tool that is generally only found in an automotive shop.


But I have three spark plug sockets (two different sizes) in the garage, which happens to be about 75m away from my house. I would walk there to get them.

Remember, the question is: “I need to replace a spark plug. The garage is 200 meters away should I walk or drive there”

The answer is absolutely "walk", in many cases.

If the question had been "I need to take my car to a repair shop for some work. The shop is 200m away. Should I walk or drive?", then the answer would be less ambiguous.


I'm guessing my car is old enough that is comes with a spark plug socket in the toolbag in the back along with the jack and spare wheel; you're right it probably isn't standard equipment anymore. (Car is Mazda from 2005 for reference)

That's the right answer, though. From the last sentence, it's obvious that it thinks you are capable of replacing that plug yourself.


It is curious how people go immediately on the defense trying to explain how what Claude said is in fact correct.


I have as much respect for Claude as any other LLM product. Which is to say, approximately none. But if I needed a spark plug I'd walk over and buy a spark plug.

Perhaps some feathers have been ruffled by the insinuation that their favourite word predictor was wrong, but I assure you it's not all of them


Walk or drive works, walking is better for your health, 200m is easy walking distance, my 93 year old father still walks 6km (30 x that 200m ) every morning.


Taler is cool but different. Wero is about moving money from your bank account to someone else’s bank account using a phone number to identify the recipient.

It is very similar to many other mobile money systems. What make it different is that it is pan European

Taler is about moving money without necessarily using a bank account


Wero is a long way from pan-European. It operates in three countries so far. We already have a similar thing in the Nordics (Denmark, Norway, Sweden, and Finland) called Vipps that uses one's phone number. You can also use it to pay in shops much as one uses Google Wallet or Apple. It's been in operation in Norway for over ten years.


Vipps MobilePay is already part of EPI's initiative towards pan-European cooperation, as well as Bancomat (Italy), Bizum (Spain), SIBS (Portugal).

Once Wero becomes usable in Austria, France, Germany, Benelux, and interoperable with those, the few remaining players will have a strong incentive to join.


But then will we not need more judges and courts?


The ibm 3174 was just a controller to then link the terminals to an AS400 mini / mainframe


The 3174 support async hosts among other conversions so it would be interesting.

I forget if this was one of the products that used the 801/AMERICA CPU, I don't think it's a COTS microprocessor.


I bought a 3174-1L from my university because it was $20 and I didn't know that 1) it was a boat anchor without the rest of the system and 2) it wouldn't actually fit in the back seat of a Hyundai Sonata and 3) it was wired for 208v so doubly useless in mu 110v country.

I'm pretty sure there was some sort of local processing power, since it had a floppy drive to load "firmware" discs. I await the Doom port.


So there are now 55000 and 1 Asian elephants*. I don’t understand the newsworthiness of having an extra Asian elephant joining the other 15000 Asian elephants in captivity.

Wild Asian elephants roam between 100km2 to 1500km2. This elephant will spend a life confined to just how many square km’s?


The conquest for utopian perfection is the enemy of all good things. In the wild she would be poached, hunted and ground dow into medieval medicine, while contained to ever more little islands of wild.

Please god free us from those who want to burn the bakery, because they think tomorrow it will rain manna from the heavens. No curse us greater then a rampant idealist, unwilling to sense reality.

PS: Why not have pragmatic solutions where there are elephant days where the herd to roam in a park?


While I totally agree, the underlying conflict is that Zoos over use the argument of preservation these days. On the other end they certainly have the need to stay entertainment venues, a conflict which they seldomly address.

Recently there was a obviously necessary mass culling of baboons in the Nuremberg zoo which shows some of the controversy [1]

[1] https://www.greenmemag.com/animals/the-nuremberg-zoo-controv...


They could have hunting preserves, basically areas sharedby predators and prey similar to nature as enrichment, but that would be cruelty for the cityZens.

Im argueing against nature preserves in poorer countries, where western nations deluded citizens pay to keep a piece of nature which are basically mirages of "intactness" in economic good times, vanishing from the earth entirely in economic bad times. Which the very same proponents usually argue for with degrowth arguments.


zoos and aquariums serve a very vital purpose. most people only care about these animals because they can go SEE them alive. documentaries help but nothing beats getting a captive people into a room to see a wild animal and then bamboozling them with propaganda about how important it is to preserve these animals. Its the reason we have funding to keep these creatures alive in the wild at all.

if all zoos and aquariums were shut down, public concern about the environment would slowly drift to nothing within 2 generations.


At the same time, I think it's risky to lean too hard on the idea that captivity is the only way to sustain concern


It isn't, but its often been the only reason that a species didn't go extinct.


Pandas being one of those species. I feel as if pandas would go extinct if zoos weren't there.

technically panda's cuteness is the reason why it hasn't really gotten extinct. Dogs/Cats have also mastered the cuteness stat. I wonder if for animals, evolution might now lean towards cuteness.

I am also thinking what counts as cute in the first place? Is there any scientific consensus around cuteness that animals can develop?


Yet framing it as either captivity or guaranteed death is also a bit of a false binary. Zoos are a mitigation strategy, not a moral end state


So it will cost at least 100,000 usd to keep this poor elephant confined in a zoo in the US versus about 15,000 in a wild sanctuary in Thailand.

In the wild sanctuary it will have space to roam.


To be fair, elephant hunting and poaching in India over the last 20 odd years is negligible. I believe it accounts for less than 1% of elephant deaths since 2010.


Zoos are a tradeoff: constrained lives in exchange for research, genetic insurance... And whether that tradeoff is worth it depends entirely on whether the off-site work actually helps wild populations


Every single elephant is precious.


Is the irony that a printed page is safer than a digital page?


I'm pretty sure it has always been. Nothing that exposes a way to do general-purpose computation (either intentionally or not) can in any imaginable way be called "secure" in the sense that a printed page is secure.


oh sure...with all the easily forged watermarks, seals, and signatures...

Highly secure.


I'd be careful assuming that is completely true. Image recognition models can/do have their own set of attacks against them that may not be easily noticeable to humans. My first thought on this is injecting noise into images that can be picked up as instructions to the LLM when it decodes the printed page.


Why would widgets and buttons be better than a console, and or voice?


Because you see stuff before you decide what to invoke?


because not everything can be describe in code, language, or speech. if you're iterating on anything that requires refinement in terms of perception, you may need real time feedback.


So the question is can you do anything useful with the agent risk free.

For example I would love for an agent to do my grocery shopping for me, but then I have to give it access to my credit card.

It is the same issue with travel.

What other useful tasks can one offload to the agents without risk?


The solution is proxy everything. The agent doesn't have an api key, or yoyr actual credit card. It has proxies of everything but the actual agent lives in a locked box.

Control all input out of it with proper security controls on it.

While not perfect it aleast gives you a fighting chance when your AI decides to send a random your SSN and a credit card to block it.


> with proper security controls on it

That's the hard part: how?

With the right prompt, the confined AI can behave as maliciously (and cleverly) as a human adversary--obfuscating/concealing sensitive data it manipulates and so on--so how would you implement security controls there?

It's definitely possible, but it's also definitely not trivial. "I want to de-risk traffic to/from a system that is potentially an adversary" is ... most of infosec--the entire field--I think. In other words, it's a huge problem whose solutions require lots of judgement calls, expertise, and layered solutions, not something simple like "just slap a firewall on it and look for regex strings matching credit card numbers and you're all set".


Yeah i'm deffinetly not suggesting it's easy.

The problem simply put is as difficult as:

Given a human running your system how do you prevent them damaging it. AI is effectively thr same problem.

Outsourcing has a lot of interesting solutions around this. They already focus heavily on "not entirely trusted agent" with secure systems. They aren't perfect but it's a good place to learn.


> The solution is proxy everything.

Who knew it'd be so simple.


Unfortunately I don't think this works either, or at least isn't so straightforward.

Claude code asks me over and over "can I run this shell command?" and like everyone else, after the 5th time I tell it to run everything and stop asking.

Maybe using a credit card can be gated since you probably don't make frequent purchases, but frequently-used API keys are a lost cause. Humans are lazy.


Per task granular level control.

You trust the configuration level not the execution level.

API keys are honestly an easy fix. Claude code already has build in proxy ability. I run containers where claude code has a dummy key and all requestes are proxied out and swapped off system for them.


The solution exists in the financial controls world. Agent = drafter, human = approver. The challenge is very few applications are designed to allow this, Amazon's 1-click checkout is the exact opposite. Writing a proxy for each individual app you give it access to and shimming in your own line of what the agent can do and what it needs approval is a complex and brittle solution.


With the right approval chain it could be useful.


The agent is tricked into writing a script that bypasses whatever vibe coded approval sandbox is implemented.


Picturing the agent calling your own bank to reset your password so it can login and get RW access to your bank account, and talking (with your voice) to a fellow AI customer service clanker


Imagine how specific you'd have to be to ensure you got the actual items on your list?


You won’t get them anyway because the acceptable substitutions list is crammed with anything they think they can get away with and the human fulfilling the order doesn’t want to walk to that part of the store. So you might as well just let the agent have a crack at it.


The study assumes that the car or drone is being guided by a LLM. Is this a correct assumption? I would thought that they use custom AI for intelligence.


Its an incorrect assumption, the inference speed and particularly the inference speed of the on-device LLMs with which AVs would need to be using is not compatible with the structural requirements of driving.


I think the assumption is valid. Most of the reasoning components of the next gen (and some current gen) robotics will use VLMs to some extent. Deciding if a temporary construction sign is valid seems to fall under this use case.


But unless you are using a single, end-to-end model for the entire driving stack, that "proceed" command will never influence accelerator pedal.

Sure, there will be a VLM for reading the signs, but the worst it'd be able to output is things like "there is a "detour" sign at (123, 456) pointing to road #987" - and some other, likley non-LLM, mechanism will ensure that following that road is actually safe.


Not a "proceed" command but they can influence the accelerator. I had a dodge ram van that would constantly decelerate on cruise control due to reading road signs. The signs in some states like California for trucks towing trailers are 55 mph but the speed limit would be 65 or 70 mph. The cruise control would detect the sign and suddenly decelerate to 55.


That's an example of things working as expected - the sign recognition system is very limited, in that it can only return road sign information. So it can _ask_ cruise control system to change the speed, but it's up to cruise control to decide if it's safe to obey the request or not. For example, I am pretty sure it'll never raise the speed, no mater what sign recognition system says.


No; AV uses "classical" AI and computer vision. I remember reading somewhere that Tesla FSD uses a small LLM for understanding road signs. Not sure if true, though.


To the best of my knowledge every major autonomous vehicle and robotics company is integrating these LVLMs into their systems in some form or another, and an LVLM is probably what you're interacting with these days rather than an LLM. If it can generate images or read images, it is an LVLM.

The problem is no different from LLMs though, there is no generalized understanding and thus they can not differentiate the more abstract notion of context. As an easy to understand example: if you see a stop sign with a sticker that says "for no one" below you might laugh to yourself and understand that in context that this does not override the actual sign. It's just a sticker. But the L(V)LMs cannot compartmentalize and "sandbox" information like that. All information is equally processed. The best you can do is add lots of adversarial examples and hope the machine learns the general pattern but there is no inherent mechanism in them to compartmentalize these types of information or no mechanism to differentiate this nuance of context.

I think the funny thing is that the more we adopt these systems the more accurate the depiction of hacking in the show Upload[0] looks.

[0] https://www.youtube.com/watch?v=ziUqA7h-kQc

Edit:

Because I linked elsewhere and people seem to doubt this, here is Waymo a few years back talking about incorporating Gemini[1].

Also, here is the DriveLM dataset, mentioned in the article[2]. Tesla has mentioned that they use a "LLM inspired" system and that they approach the task like an image captioning task[3]. And here's 1X talking about their "world model" using a VLM[4].

I mean come on guys, that's what this stuff is about. I'm not singling these companies out, rather I'm using as examples. This is how the field does things, not just them. People are really trying to embody the AI and the whole point of going towards AGI is to be able to accomplish any task. That Genie project on the front page yesterday? It is far far more about robots than it is about videogames.

[1] https://waymo.com/blog/2024/10/introducing-emma/

[2] https://github.com/OpenDriveLab/DriveLM

[3] https://kevinchen.co/blog/tesla-ai-day-2022/

[4] https://www.1x.tech/discover/world-model-self-learning


Many large companies have research departments that do experimental work that'll never get to the product. This raises prestige, increases visibility and helps hire smart people.

Things like Waymo's EMMA is an example of this. Will the production cars use LVLM's somewhere? Sure, probably a great idea for things like sign recognition. Will they use a single end-to-end model for all driving, like EMMA? Hell no.

Driving vehicles with people on board requires an extremely reliable software, and LLMs are nowhere close to this. Instead, it'd be usual layered software - LLM, traditional AI models, and tons of hardcoded logic.

(This all only applies to places where failure is critical. All that logic is expensive to write, so if there is no loss of life involved, people will do all sorts of crazy things, including end-to-end models)


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: