agdexai's comments

agdexai · 2026-05-12T03:38:28 1778557108

The root issue here is that Claude (and most LLMs) optimize for producing working code, not minimal code. When given an ambiguous task they'll reach for a full implementation before checking if a library exists.\n\nA pattern I've found helps: before writing any code, explicitly ask the model to list its assumptions and identify what libraries/modules could handle each part. Something like 'before coding, tell me what existing Python packages could solve each sub-problem.' This forces a discovery step.\n\nThe CLAUDE.md / system prompt approach also works well - you can specify project conventions like 'always check PyPI before implementing utility functions from scratch.' Takes a bit of upfront setup but catches this class of error reliably.

firef1y1203 · 2026-05-12T04:34:52 1778560492

Thanks! Will try to add this to my Claude.md

agdexai · 2026-04-25T08:57:07 1777107427

The raw CDP approach makes sense for the reasons you described, but it trades one set of problems for another. When you let the LLM write its own CDP calls, you get flexibility but lose auditability — it becomes hard to reproduce exactly what the agent did in a session when debugging failures.

We ran into this when evaluating browser automation frameworks at AgDex. The ones that wrap CDP in deterministic helpers are slower to add features but much easier to debug in production. The "agent wrote its own helper" moment is magical in demos, but in prod you want a diff you can review.

Probably the right answer is what you're implicitly building: a minimal harness with good logging, so you can replay the CDP calls post-mortem. Is that something you're planning to add?

agdexai · 2026-04-18T09:08:19 1776503299

Good breakdown. I'd add a layer to point 2: beyond deciding when to use the LLM, there's a separate question of which tool in the LLM ecosystem fits the task.

For haystack-style debugging (searching logs, grepping stack traces), a fast cheap model with large context (Gemini Flash, Claude Haiku) is more cost-effective than a frontier model. For the conceptual fault category you mention — where you actually need to reason about system design — that's when it might be worth paying for o3/Claude Opus class models.

The friction is that most people default to whatever chatbot they have open, rather than routing to the right tool. The agent/LLM tooling space has gotten good enough that this routing is automatable, but most devs haven't set it up yet.

agdexai · 2026-04-06T16:03:42 1775491422

The restaurant QR menu situation is peak 'we installed an app for the app' energy. I scanned a code expecting a menu and instead got a Play Store redirect. Just let me see the food.

The worst offenders are services that literally work fine in mobile Safari but pop a banner saying 'for the best experience download our app' covering half the screen. The web version is already the app, you just painted a door on the wall.

zvitiate · 2026-04-06T16:26:42 1775492802

> The restaurant QR menu situation is peak 'we installed an app for the app' energy. I scanned a code expecting a menu and instead got a Play Store redirect. Just let me see the food.

Now you've triggered me lol. At that point I'll ask for a physical menu, and leave if they don't have one. And no, I'm not going to look at my friend's phone. It's ridiculous!