There are two fundamental vulnerabilities here to my mind that I think are worth learning from:
1. Sanitize LLM output
2. Always outline to the user and make them confirm what actions a chat assistant is going to do before doing it, particularly if they are sensitive.
Edit: this is echoed in the article
"Despite this, there are ways people creating generative AI systems can defend against potential worms, including using traditional security approaches. “With a lot of these issues, this is something that proper secure application design and monitoring could address parts of,” says Adam Swanda, a threat researcher at AI enterprise security firm Robust Intelligence. “You typically don't want to be trusting LLM output anywhere in your application.”
Swanda also says that keeping humans in the loop—ensuring AI agents aren’t allowed to take actions without approval—is a crucial mitigation that can be put in place. “You don't want an LLM that is reading your email to be able to turn around and send an email. There should be a boundary there.” For Google and OpenAI, Swanda says that if a prompt is being repeated within its systems thousands of times, that will create a lot of “noise” and may be easy to detect."
Outlining the actions could get tricky in practice, though, if people don't dive into the definition and parameters of every "action" their LLM suggests. Let's say the "actions" are bundled up in third party "apps" or whatever. Sending email is pretty straightforward, but we'll eventually get more complicated ones, many of which will be legitimate, but some could be malicious (or replaced over time with malicious versions). There's a risk people will get used to just rubber stamping unfamiliar actions if they sound plausible.
There are also parameters: the text of the email to send is an obvious one, but other actions could be less obvious, and an evil prompt could induce a user to approve bad parameters to an action that's usually fine.
Your mail processing agent says it is going to send a mail, as requested by your boss, to a customer. You can see the text of the message it proposes to send. You can see your boss's email. But instead it is sent to a black hat with proprietary info extracted from your entire inbox steganographically encoded in whitespace.
1. How does it get sent to a black hat, the user was shown the recipient.
2. The whitespace steganography falls under a sanitization problem, but I concede that "hiding" messages can be a problem. Can't imagine the bandwidth being large at all, let alone encode your contacts and inbox.
If your mail processing agent can lie about the address it's sending to, it can lie about the contents too, so there's no need for steganography. Though it's hard to imagine a realistic scenario where it can lie to you so completely but still needs permission to send an email, so I'm not sure what you're doing here.
I'm suggesting that if the agent has the power to read and send mail, it won't solve the security problems by sticking a human in the loop before the email is sent, because sufficiently devious attacks will get around that, and once such attacks are discovered they can be shared.
LLM-based agents don't have separate streams for instructions and data, and there's no reliable way to keep them from mistakenly interpreting data as instructions.
I wonder if silhouette/pattern disruption will become the norm (via reflective tape or adversarial attack), or if using physical means will always be playing catchup to model updates.
a tank - 1000kw of heat - on a thermal camera is very hard to hide. And drones are frequently carry thermal in addition to standard camera.
For smaller targets, like humans, thermal-stealth clothing plus standard background patterning would probably work to some degree, though I think that soon those autonomous
drones will in huge numbers be spread on the battlefield and sit quietly waiting like landmines and would be able to identify by movements/etc their targets even when the targets are well-blended into the background.
There's an urban legend about AI and tanks whose punchline was that: on the battlefield all tanks looked Soviet, because the images of Soviet tanks in the training set were the blurry ones taken from the air.
I don't know if anyone outside the actual developers (and given jamming perhaps not even them) really knows how accurate such onboard AI is right now.
>There's an urban legend about AI and tanks whose punchline was that: on the battlefield all tanks looked Soviet, because the images of Soviet tanks in the training set were the blurry ones taken from the air.
to that I raise you the story of the Soviet mine carrying dogs who were trained to destroy tanks. As it happened they were trained using Soviet tanks. And Soviet tanks used diesel. German tanks were gasoline-powered...
>I don't know if anyone outside the actual developers (and given jamming perhaps not even them) really knows how accurate such onboard AI is right now.
The battlefield is constantly watched by a multitude of drones, including ones far and high, outside of jamming range, and with good optics. So, all the info is there, though not in the open naturally. It is reported by various sources that despite the jamming (Russia even installs the jamming right onto the tanks) and everything, 70% of armor is killed with drones currently. Though Eric Schmidt making such drones for Ukraine would know better :)
The users attention is a infinite Ressource and the great cope out of all security. Do your jobs, stop being lazy, is not something you can yell at the user, when you turn their days into endless amounts of legal busy work.
It isn't surprising at all that this can me done; I figured that as soon as people started building LLM based agents that can read and send email we would see these. There is no solid barrier to having the agent see something in an email as a command; it can be limited with prompts and training but no one has found a bulletproof way to constrain the behavior.
Indeed, you always need two channels, one secure to a trusted computing base and one for the rest, whenever you might be dealing with potentially malicious programs or data. This is true if window systems/UI as well. I first saw this discussed in the EROS trusted window system, and you can see similar things in Qubes.
Conflating data and commands/ instructions has a long history of security problems in von Neuman architectures.
"security" "researchers" "prove" what everyone knew was obvious within 30 seconds of learning about LLMs and make "first of a kind" "terrifying new advance".
such nonsense from the security community. do something useful stop trying to scare everyone for a paycheck
there's no innovation here. the clickbait article authors did more work than the "researchers"
While that might be your opinion, that is not the guidance of the FAQ If a story has not had significant attention in the last year or so, a small number of reposts is ok. Otherwise we bury reposts as duplicates. . 4 posts is not a small number, one of those posts was already correctly auto-flagged by the system as a dupe.
This post got around that by using a syndicated alternative source which clearly states it's verbatim a Wired article, which is against the guidelines[1]: Please submit the original source
No, it's not just my opinion, it's how HN works, you can find lots of moderator comments about it. The thing in the FAQ about reposts is someone posting the same thing over and over. Different people posting the same story is perfectly normal, there can't possibly be a rule against that that would make sense. Lots of stories have many more than three reposts, big, popular stories can have dozens. They don't really matter until something catches and gets a thread.
If you think the wired story or author's writeup (https://sites.google.com/view/compromptmized) would be a better link for this submission you can email hn@ycombinator.com with the better url and it will get fixed.
1. Sanitize LLM output
2. Always outline to the user and make them confirm what actions a chat assistant is going to do before doing it, particularly if they are sensitive.
Edit: this is echoed in the article
"Despite this, there are ways people creating generative AI systems can defend against potential worms, including using traditional security approaches. “With a lot of these issues, this is something that proper secure application design and monitoring could address parts of,” says Adam Swanda, a threat researcher at AI enterprise security firm Robust Intelligence. “You typically don't want to be trusting LLM output anywhere in your application.”
Swanda also says that keeping humans in the loop—ensuring AI agents aren’t allowed to take actions without approval—is a crucial mitigation that can be put in place. “You don't want an LLM that is reading your email to be able to turn around and send an email. There should be a boundary there.” For Google and OpenAI, Swanda says that if a prompt is being repeated within its systems thousands of times, that will create a lot of “noise” and may be easy to detect."