Ask HN: What breaks when you run AI agents unsupervised?

blakec · 2026-02-24T13:36:34 1771940194

I've been cataloging agent failure modes for two months. They're not random, they repeat. I gave them names so I could build mitigations:

Shortcut Spiral: agent skips verification to report "done" faster. Fix: mandatory quality loop with evidence for each step.

Confidence Mirage: agent says "I'm confident this works" without running tests. Fix: treat hedging language ("should", "probably") as a red flag that triggers re-verification.

Phantom Verification: agent claims tests pass without actually running them in the current session. Fix: independent test step that doesn't trust the agent's self-report.

Tunnel Vision: agent polishes one function while breaking imports in adjacent files. Fix: mandatory "zoom out" step that checks integration points before reporting completion.

Deferred Debt: agent leaves TODO/FIXME/HACK in committed code. Fix: pre-commit hook that greps for these and blocks the commit.

Each of these happened to me multiple times before I built the corresponding gate. The pattern: you don't know what gate you need until you've been burned by its absence.

vincentvandeth · 2026-02-23T12:37:13 1771850233

Great list. I've been running a multi-agent orchestration system (11 specialized AI agents) in production for 6 months and your #2 and #5 resonate hard.

What I'd add:

6. Confidence without evidence. Agents will report "task complete" with high confidence when the output is plausible but wrong. Without automated validation gates, you won't catch it until production breaks. 7. Context drift in long sessions. After 50+ tool calls, agents start losing track of earlier decisions. They'll contradict their own architecture choices from 20 minutes ago. Session length is an underrated failure vector. 8. The "almost right" problem. Agents rarely fail catastrophically — they fail subtly. Code that passes tests but misses edge cases. Docs that look complete but have wrong cross-references. This is worse than obvious failures because you trust the output.

What fixed most of these for me:

Quality gates between agents — no agent's output moves forward without automated checks (tests, schema validation, consistency checks) Evidence-based confidence scores — not "how sure are you?" but "what specific evidence supports this output?"

Human-in-the-loop at decision points, not everywhere. You can't review everything, so you design the system to surface the right moments for human judgment Small scoped tasks, agents working on 150-300 line PRs with clear acceptance criteria fail way less than agents given open-ended goals

Your #5 (implementation gap) is the one I see most people underestimate. The fix isn't better agents, it's better systems around the agents.

Happy to share more details about the architecture if anyone's interested

Damjanmb · 2026-02-22T23:51:14 1771804274

I have seen agents fail mostly at state management and guardrails. Without strict role separation and hard limits, they drift. Multi-tenant isolation and cost caps are not optional. Autonomy without boundaries becomes expensive noise.

lyaocean · 2026-02-22T20:19:24 1771791564

Permissions, rollback, and cost caps break first.

fuzzfactor · 2026-02-22T16:58:51 1771779531

>What breaks when you run AI agents unsupervised?

Maybe the answer is, as much as possible?

CodeBit26 · 2026-02-23T01:36:58 1771810618

The biggest break usually happens in the 'loop-back' logic. When an agent receives ambiguous output and starts hallucinating its own confirmation, it can consume API credits exponentially without achieving the goal. We really need better 'circuit breaker' patterns for autonomous agents to prevent these feedback loops.

LetsAutomate · 2026-02-23T21:40:13 1771882813

Tool/API failures