I'm suggesting that if the agent has the power to read and send mail, it won't solve the security problems by sticking a human in the loop before the email is sent, because sufficiently devious attacks will get around that, and once such attacks are discovered they can be shared.
LLM-based agents don't have separate streams for instructions and data, and there's no reliable way to keep them from mistakenly interpreting data as instructions.
LLM-based agents don't have separate streams for instructions and data, and there's no reliable way to keep them from mistakenly interpreting data as instructions.