Both are dramatically over-engineered. & That's okay. I find them to be products of an industry reconciling how to really work with AI as well as optimize workflows around it. Similar to Gastown et al.
Otherwise, if you can own your own thinking, orchestrating, and steering of agents, you're in a more mature place.
This has aged well. Paradoxically, the more capable AI gets, the more important specification becomes (or costly lack of it becomes), and the more time you spend planning and iterating on intent before you let an agent act.
Tokens and code might be cheap, but we are moving closer and closer to where we let AI operate overnight or where we want agents to operate within our actual environments in real time. The tokens or actions in these instances, have higher costs.
Right now I enjoy the labs' cli harnesses, Claude Code, and Codex (especially for review). I do a bunch of niche stuff with Pi and OpenCode. My productivity is up. Some nuances with working with others using the same AI tools- we all end up trying to boil the ocean at first- creating a ton of verbose docs and massive PRs, but I/they end up regressing on throwing up every sort of LLM output we get. Instead, we continously refine the outputs in a consumable+trusted way.
My workday is fairly simple. I spend all day planning and reviewing.
1. For most features, unless it's small things, I will enter plan mode.
2. We will iterate on planning. I built a tool for this, and it seems that this is a fairly desired workflow, given the popularity through organic growth. https://github.com/backnotprop/plannotator
- This is a very simple tool that captures the plan through a hook (ExitPlanMode) and creates a UI for me to actually read the plan and annotate, with qol things like viewing plan diffs so I can see what the agent changed.
3. After plan's approved, we hit eventual review of implementation. I'll use AI reviewers, but I will also manually review using the same tool so that I can create annotations and iterate through a feedback loop with the agents.
4. Do a lot of this / multitasking with worktrees now.
I've been working on a thing for worktrees to work with docker-compose setups so you can run multiple localhost environments at once https://coasts.dev/. It's free and open source. In my experience it's made worktrees 10x better but would love to hear what other folks are doing about things like port conflicts and db isolation.
I think I'd be okay with a smaller, more narrative-detailed plan - not so much about verbosity, more about me understanding what is about to happen & why. There hadn't been much discourse once planning mode entered (ie QA). It would jump into its own planning and idle until I saw only a set of projected code changes.
Could you provide the details of the complete verification?
*On the original story you only showed Claude like responses, not how you dug into the binary
I understand. Thank you for sharing. I didn't uncover all of this until Claude told me its specific system instructions when I asked it to conduct introspection. I'll revise the blog so that I don't encourage anybody else to do deeper introspection with the tool.
As a divergent thinker who is harmed when Claude behaves in unpredictable manners that go counter to my extensive harm prevention protocols, I may have, or may not have, done deep investigation of the tool in order to understand how to create my harm prevention protocols. When Anthropic employees push out unstable work, developers in general are significantly impacted. When unstable products end up in my workflow I am harmed both financially AND psychologically. I can lose hours, days, even weeks by an unstable model or IDE. I should not EVER be tested on. And if maybe diving into their product protects me, so be it.
I understand. Just with AI, I don't think the behavior should change so drastically. Which I understand is paradoxical because we enjoy it when it can 10x or 1000x our workflow. I think responsible AI includes more transparency and capability control.
That ship has sailed. These models were trained unethically on stollen data, they pollute tremendously and are causing a bubble that is hurting people.
Otherwise, if you can own your own thinking, orchestrating, and steering of agents, you're in a more mature place.
reply