Hacker Newsnew | past | comments | ask | show | jobs | submit | ramoz's commentslogin

Both are dramatically over-engineered. & That's okay. I find them to be products of an industry reconciling how to really work with AI as well as optimize workflows around it. Similar to Gastown et al.

Otherwise, if you can own your own thinking, orchestrating, and steering of agents, you're in a more mature place.


I also see it as fleeting as right when you have it figured out, a new model will work differently and may/may not need all their engineering layers.

I think that's fair, if they were created today I'm sure the creators would make different decisions, a penalty of getting there first.

This has aged well. Paradoxically, the more capable AI gets, the more important specification becomes (or costly lack of it becomes), and the more time you spend planning and iterating on intent before you let an agent act.

Tokens and code might be cheap, but we are moving closer and closer to where we let AI operate overnight or where we want agents to operate within our actual environments in real time. The tokens or actions in these instances, have higher costs.


Right now I enjoy the labs' cli harnesses, Claude Code, and Codex (especially for review). I do a bunch of niche stuff with Pi and OpenCode. My productivity is up. Some nuances with working with others using the same AI tools- we all end up trying to boil the ocean at first- creating a ton of verbose docs and massive PRs, but I/they end up regressing on throwing up every sort of LLM output we get. Instead, we continously refine the outputs in a consumable+trusted way.

My workday is fairly simple. I spend all day planning and reviewing.

1. For most features, unless it's small things, I will enter plan mode.

2. We will iterate on planning. I built a tool for this, and it seems that this is a fairly desired workflow, given the popularity through organic growth. https://github.com/backnotprop/plannotator

  - This is a very simple tool that captures the plan through a hook (ExitPlanMode) and creates a UI for me to actually read the plan and annotate, with qol things like viewing plan diffs so I can see what the agent changed.
3. After plan's approved, we hit eventual review of implementation. I'll use AI reviewers, but I will also manually review using the same tool so that I can create annotations and iterate through a feedback loop with the agents.

4. Do a lot of this / multitasking with worktrees now.

Worktrees weren't something I truly understood the value of for a while, until a couple weeks ago, embarrassingly enough: https://backnotprop.com/blog/simplifying-git-worktrees/


I've been working on a thing for worktrees to work with docker-compose setups so you can run multiple localhost environments at once https://coasts.dev/. It's free and open source. In my experience it's made worktrees 10x better but would love to hear what other folks are doing about things like port conflicts and db isolation.

Worktrees slap.

I follow a very similar workflow, with manual human review of plans and continuous feedback loops with the plan iterations

See me in action here. It's a quick demo: https://youtu.be/a_AT7cEN_9I


Thanks for the transparency. Sorry for the noise.

I think I'd be okay with a smaller, more narrative-detailed plan - not so much about verbosity, more about me understanding what is about to happen & why. There hadn't been much discourse once planning mode entered (ie QA). It would jump into its own planning and idle until I saw only a set of projected code changes.


This stemmed from me asking Claude itself why it was writing such _weird_ plans with no detail (just a bunch of projected code changes).

Claude stated: in its system prompt, it had strict instructions to provide no context or details. Keep plans under forty lines of code. Be terse.


This is Claude’s output of its system prompt, can you verify without going Claude of the system prompt? There is still potential of hallucination.

There was a complete verification. This entire thread provides context around what I originally published - which I wouldn't recommend recreating

Could you provide the details of the complete verification? *On the original story you only showed Claude like responses, not how you dug into the binary

I understand. Thank you for sharing. I didn't uncover all of this until Claude told me its specific system instructions when I asked it to conduct introspection. I'll revise the blog so that I don't encourage anybody else to do deeper introspection with the tool.

As a divergent thinker who is harmed when Claude behaves in unpredictable manners that go counter to my extensive harm prevention protocols, I may have, or may not have, done deep investigation of the tool in order to understand how to create my harm prevention protocols. When Anthropic employees push out unstable work, developers in general are significantly impacted. When unstable products end up in my workflow I am harmed both financially AND psychologically. I can lose hours, days, even weeks by an unstable model or IDE. I should not EVER be tested on. And if maybe diving into their product protects me, so be it.

I apologize for doing this - and I agree. I will revise

I still think you have a point here. Doing this kind of testing on users unwittingly is unethical in my opinion

I understand. Just with AI, I don't think the behavior should change so drastically. Which I understand is paradoxical because we enjoy it when it can 10x or 1000x our workflow. I think responsible AI includes more transparency and capability control.

You rent ai, you don’t own it (unless you self host).

That ship has sailed. These models were trained unethically on stollen data, they pollute tremendously and are causing a bubble that is hurting people.

"Responsible" and "Ethic" are faaar gone.


I'll try posting again today, just because this is an active thing that I'm trying to get fixed.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: