Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'm suggesting that if the agent has the power to read and send mail, it won't solve the security problems by sticking a human in the loop before the email is sent, because sufficiently devious attacks will get around that, and once such attacks are discovered they can be shared.

LLM-based agents don't have separate streams for instructions and data, and there's no reliable way to keep them from mistakenly interpreting data as instructions.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: