More

peterbell_nyc · 2026-05-11T18:06:24 1778522784

Love the DDD callout. I have explicit steps to review and rate delta's to the ubiquitous language and one of my architectural reviewers will often engage with me about where the bounded contexts should be and will probably the translation layers.

I find the more good practices I add to my envision/scope/spec/build/test/deploy loops the happier I am with the outcomes.

I will say that I am finding the actual code to be somewhat ephemeral for me - the more precise the specifications are and generally the tighter and more elegant the design is, the less the code matters as a long term artifact.

I'm not at the "code is assembler" point yet - but I could see that with more, richer specs I could end up there. Of course the specs are then substantial, but declarative specs can be robust and unambigous (with sufficient read teaming review) and - like domain specific languages - reduce the accidental complexity of the syntax when compared to an implementation in a given language.

There are exceptions to all of this, but it's fascinating to see how it's evolving!

peterbell_nyc · 2026-05-11T17:59:52 1778522392

I'm generally in agreement with everyone here. - Some code is ephemeral - it's generated to do the thing, thrown away end of session and the csv was imported successfully (or whatever). Make sure you have at least some testing of the output or you may find the email is in the last name field for some rows. If possible, have an API your agent uses with rich domain types and validations that force it to do things right or do them again (and that it' can't rewrite to relax the constraints!) - You can one or few shot a real app - for a few users, for a small set of use cases. Scope of this will improve with models, but at least today it's spelling bee app for my kids" not "salesforce replacement for millions of workers". - You can add rich validation steps for all types of quality that you care about which (assuming they converge) can deliver high performance, well designed and functionally correct code mostly autonomously.

I'm building an orchestrator (who isn't). Haven't looked at the code yet, but it appears to work. But man have I spent hours in loops between Claude, Codex and myself all on the highest thinking levels to figure out what interface portability means for the employee, how best to handle "remote" sessions and the appropriate semantics for pipelines/recipes.

I've also been very opinionated about who does what. I'll let the agent write a script to sync with github and reload workers, but I decided to "waste" the 5 minutes to manually do all of the config steps on render for my server when claude told me that I couldn't just give it read only scope to pull the logs. Bad news, I'm cutting and pasting for my computer overlord. Good news? Claude can't blow away the prod db if it happens to get in the way of whatever interpretation is makes of the instructions I give it.

A chainsaw requires very different skills that an axe. It has different failure modes. Some experience as a lumberjack probably helps using either/both.

No difference (at least now) with agents.

tombert · 2026-05-11T18:15:35 1778523335

I have found that for low-stakes stuff, where "good enough" really is ok, Claude and Codex have been pretty great. I don't particularly care if the code is optimal, just enough to do that job.

For example, I had Claude generate a language server for TLA+ so I could have nice keystrokes in Neovim. For things like this, I really do think there is such thing as "good enough"; a language server doesn't have to be perfect, and the stakes are pretty low, where I think the worst case scenario is that it screws up my code, but that should be relatively easy to catch in Git.

I have been trying to mostly have Claude generate code from specifications; either a Mermaid diagram for simpler stuff, and TLA+ for more complicated stuff. I usually supply a lot of surrounding context about how I want these specs to be implemented, and it will usually get me about 90% of the way there, but I've found that I still need to hack against it to get over the hump.

It makes me feel a little valuable; I finally have an excuse to use formal methods for things.

peterbell_nyc · 2026-05-06T17:53:52 1778090032

For me the distinction is the quality and rigor of your pipeline.

Vibe coding: one shot or few shot, smoke test the output, use it until it breaks (or doesn't). Ideal for lightweight PoC and low stakes individual, family or small team apps.

Agentic engineering: - You care about a larger subset of concerns such as functional correctness, performance, infrastructure, resilience/availability, scalability and maintainability. - You have a multi-step pipeline for managing the flow of work - Stages might be project intake, project selection, project specification, epic decomposition, d=story decomposition, coding, documentation and deployment. - Each stage will have some combination of deterministic quality gates (tests must pass, performance must hit a benchmark) and adversarial reviews (business value of proposed project, comprehensiveness of spec, elegance of code, rigor and simplicity of ubiquitous language, etc)

And it's a slider. Sometimes I throw a ticket into my system because I don't want to have to do an interview and burn tokens on three rounds of adversarial reviews, estimating potential value and then detailed specification and adversarial reviews just to ship a feature.

Aurornis · 2026-05-06T18:14:32 1778091272

If your slider only goes between vibe coding or agentic engineering you're missing an entire range of engineering where the human is more involved.

I've been using Opus, GPT-5.5, and some lesser models on a daily basis, but not having them handle entire tasks for me. Even when I go to significant effort to define and refine specs, they still do a lot of dumb things that I wouldn't allow through human PR review.

It would be really easy to just let it all slide into the codebase if I trusted their output or had built some big agentic pipeline that gave me a false sense of security.

Maybe 10 years from now the situation will be improved, but at the current point in time I think vibe coding and these agentic engineering pipelines are just variations of a same theme of abdicating entirely to the LLM.

This morning I was working on a single file where I thought I could have Opus on Max handle some changes. It was making mistakes or missing things on almost every turn that I had to correct. The code it was proposing would have mostly worked, but was too complicated and regressed some obvious simplifications that I had already coded by hand. Multiply this across thousands of agentic commits and codebases get really bad.

transcriptase · 2026-05-06T23:53:38 1778111618

Next time give it the context required for the task, eg an explanation of why you have those hand coded simplifications, and be amazed at how proper use of a tool works better than just assuming your drill knows what size bit to pick.

bryan0 · 2026-05-06T18:24:08 1778091848

I agree, vibe coding does not have quality gate checks at each stage, while agentic engineering does. Dev teams get into trouble when they try build to build without a proper process of design, tests, and reviews. This was true before agentic coding, but it's especially true now. The teams that understand how to leverage agents in this process are the ones that will be most successful.

peterbell_nyc · 2026-05-05T18:29:51 1778005791

Helps if you both hand to original agent as strong guidance and then to an adversarial agent as a quality reviewer. The adversarial agent is more likely tro loop the work back if it fails the validation criteria.

I do find that just asking the same agent to do and check it's own work is not particularly reliable.

peterbell_nyc · 2026-04-16T00:53:03 1776300783

Why crack one website when you can crack all of them? For a well funded (especially nation state) attacker, if $1 in compute and effort returns $2 in ransoms, when it's possible to access another n x $1 of compute and if you don't hit diminishing returns or cashflow limitations, why wouldn't you just keep spending $'s until you p0wned all the systems?

If there is only one bear, you just need to run faster than your friends. If there's a pack of them, it you need to start training much harder!

peterbell_nyc · 2026-04-14T17:24:29 1776187469

Exactly this. I'm writing my own little orchestrator and memory system and because I have a modest number of workflows, I'm taking the time to specify them deterministically, describe them as a DAG (with goto's for the inevitable loops) and generate deternministic orchestration code. I'm trying to make most of the tool calls as clear and comprehensive as possible (don't make Opus convert a PDF, have a script do that and give it the text instead) and I'm putting all the things you'd expect to track state and assume ~20% task failure rate so I can simply wipe and repeat failed tasks.

Small model and (where still required) human in the loop steps for deterministic workflows can solve a surprisingly large number of problems and don't depend on the models to be consistent or not to fail.

Just invest heavily in adversarial agents and quality gates and apply transforms on intermediate artifacts that can be validated for some dimensions of quality to minimize drift.

mrothroc · 2026-04-15T15:54:47 1776268487

I did the same with my own orchestrator. That's where I get my data.

It's amazing the power a simple workflow with automatic gate enforcement brings to agenting coding.

peterbell_nyc · 2026-04-14T17:18:34 1776187114

Seeing plenty of this. The quality of agentic code is a function of the quantity and quality of adversarial quality gates. I have seen no proof that an agentic system is incapable of delivering code that is as functional, performant and maintainable as code from a great team of developers, and enough anecdotes in the other direction to suggest that AI "slop" is going to be a problem that teams with great harnesses will be solving fairly soon if they haven't already.

apsurd · 2026-04-14T17:39:08 1776188348

I take your point but then it makes me think is there no more value in diversity?

[Philosophy disclaimer] So in a code-base diversity is probably a bad idea, ok that makes sense. But in an agentic world, if everything is run through the Perfect Harness then humans are intentionally just triggers? Not even that, like what are humans even needed for? Everything can be orchestrated. I'm not against this world, this is an ideal outcome for many and it's not my place to say whether it's inevitable.

What I'm conflicted on is does it even "work" in terms of outcomes. Like have we lost the plot? Why have any humans at all. 1 person billion dollar company incoming. Software aside, is the premise even valid? 1 person's inputs multiplied by N thousand agents -> ??? -> profit

SturgeonsLaw · 2026-04-15T06:36:46 1776235006

> Why have any humans at all

Why have humans do work at all? We could have a radically better existence. It would mean that the few at the top of the pyramid lose their privileged position relative to the rest of us, but we could, actually, have that world of abundance for all.

Work in the current sense arguably isn't even desirable

Maybe I've just read too many Culture books.

tossandthrow · 2026-04-14T20:06:07 1776197167

These are the right questions to ask.

peterbell_nyc · 2026-04-14T17:14:28 1776186868

I model this as "stacked sigmoid curves". I have no reason to believe that any specific technological implementation will be exponential in impact vs sigmoidal.

However if we throw enough money and smart people at the problems and get enough value from the early sigmoid curves, the effective impact of a large number of stacked sigmoids could theoretically average to a linear impact, but if the sigmoids stay of a similar magnitude (on average) and appear at a higher velocity over time, you end up with an exponential made up of sigmoids*

* To be fair, it has been so long since I have done math that this may be completely incorrect mathematically - I'm not sure how to model it. However I think in practice more and more sigmoids coming faster and faster with a similar median amplitude is gonna feel very fast to humans very soon - whether or not it's a true exponential.

I'm honestly having a very hard time thinking through the likely implications of what's currently happening over the next 2-10 years. Anyone who has the answers, please do share. I'm assuming from Cynafin that it's a peturbated complex adaptive system so I can just OODA or experiment, sense and respond to what happens - not what I think might happen.

peterbell_nyc · 2026-04-01T16:06:18 1775059578

4.5 and 5.2. Transformative. I know dozens of CTOs who were piloting AI in the fall, took a day to do something real over Xmas and then came back to their orgs with a mandate to double down and experiment with software factories once they saw what the November drops enabled.

peterbell_nyc · on Jan 5, 2025

There's a reason there are IC and manager tracks at most companies employing devs. You can keep refining your craft as an IC and become a Staff/Principal/Distinguished. No need to become a manager to get promotions, more money and/or more responsibility.