> start charging what the models actually cost to run
The political climate won't allow that to happen. The US will do everything to stay ahead of China, and a rise in prices means a sizeable migration to Chinese models, giving them that much more data to improve their models and pass the US in AI capability (if they haven't already).
But also it'll happen in a way, as eventually models will become optimized enough that run cost become more or less negligible from a sustainability perspective.
Calculate the approximate cost of raising a human from birth to having the knowledge and skills to do X, along with maintenance required to continue doing X. Multiply by a reasonable scaling factor in comparison to one of today's best LLMs (ie how many humans and how much time to do Xn, vs the LLM).
Calculate the cost of hardware (from raw elements), training and maintenance for said LLM (if you want to include the cost of research+software then you'll have to also include the costs of raising those who taught, mentored, etc the human as well). Consider that the human usually specializes, while the LLM touches everything. I think you'll find even a roughly approximate answer very enlightening if you're honest in your calculations.
But companies don't have to bear the cost of raising a human from birth, or training them. They only pay the cost of hiring them, and that includes cost of maintenence.
Add to that the fact that we can't blindly trust LLM output just yet, so we need a mearbag to review it.
LLM will always be more expensive than human +LLM, until we're at a stage where we can remove the human from the loop
> But companies don't have to bear the cost of raising a human from birth, or training them.
The costs do exist somewhere though, and must be paid by someone. There's no free lunch, and the human lunch is very likely far more costly than the LLM lunch.
> Add to that the fact that we can't blindly trust LLM output just yet
Can't blindly trust human output either. That's why there are various tiers in roles, from junior-equivalent to senior-equivalent, and the actual user of the product is always the final arbiter. There's ultimately nothing different, except that the LLM iterates on issue resolution in seconds to minutes, whereas the human equivalent takes hours to days.
> Claude Code works on closed source (but decompiled) source
Very likely not nearly as well, unless there are many open source libraries in use and/or the language+patterns used are extremely popular. The really huge win for something like the Linux kernel and other popular OSS is that the source appears in the training data, a lot. And many versions. So providing the source again and saying "find X" is primarily bringing into focus things it's already seen during training, with little novelty beyond the updates that happened after knowledge cutoff.
Giving it a closed source project containing a lot of novel code means it only has the language and it's "intuition" to work from, which is a far greater ask.
I’m not a security researcher, but I know a few and I think universally they’d disagree with this take.
The llms know about every previous disclosed security vulnerability class and can use that to pattern match. And they can do it against compiled and in some cases obfuscated code as easily as source.
I think the security engineers out there are terrified that the balance of power has shifted too far to the finding of closed source vulnerabilities because getting patches deployed will still take so long. Not that the llms are in some way hampered by novel code bases.
> The llms know about every previous disclosed security vulnerability class and can use that to pattern match
Do the reports include patterns that could be matched against decompiled code, though? As easily as they would against proper source? I find it a bit hard to believe.
Many vulnerabilities aren't just pattern matching though; deep understanding of the context in the particular codebase is also needed. And a novel codebase means more attention than usual will be spent grepping and keeping the context in focus. Which will make it easier to miss certain things, than if enough of the context was already encoded in the model weights.
Same thing applies to humans: the better someone knows a codebase, the better they will be at resolving issues, etc.
That's unwanted overhead IMO. And I definitely don't want to be running my regular stuff in containers; like I did a full disable and yank on snap so I never accidentally install anything with it. And every time I get into a situation where I have to reach for docker, I find that I suddenly have to be watching my disk space. Absolutely hate it.
I very much have this problem, but this doesn't solve it. I've tried tracking my installs before and it doesn't work. Thing is I just install stuff on demand, and never think about recording the installs... until I need that record. Especially when I'm solving an issue. What I need is a universal automatic tracker that just captures out all.
> Every developer on Linux already knows both.
I've been developing on Linux for over 10 years and I don't. It's like exiting vim: whenever I want to do anything beyond running a command or basic variable use, I have to go lookup how to do it online. Every time.
Looks like it was downvoted to hell and marked as dead super fast. I leave the flag for "dead" on in my HN settings (leaves it super desaturated) and this seems unusual
> The database dictates the workflow, hands the LLM a highly constrained task, and runs external validation on the output before the state is ever allowed to advance.
This sounds like where lat.md[0] is headed. Only thing is it doesn't do task constraint. Generally I find the path these tools are taking interesting.
I looked into lat.md. They are definitely thinking in the same direction by using a CLI layer to govern the agent.
The key difference is the state mechanism. They use markdown; I use an AES-encrypted SQLite database.
Markdown is still just text an LLM can hallucinate over or ignore. A database behind a compiled binary acts as a physical constraint; the agent literally cannot advance a task without satisfying the cryptographic gates.
All the prompts I've ever written with Claude have always worked fine the first time. Only revised if the actual purpose changes, I left something out, etc. But also I tend to only write prompts as part of a larger session, usually near the end, so there's lots of context available to help with the writing.
Heh, this is what people who are hostile against AI-generated contributions get. I always figured it'd happen soon enough, and here it is in the wild. Who knows where else it's happening...
The political climate won't allow that to happen. The US will do everything to stay ahead of China, and a rise in prices means a sizeable migration to Chinese models, giving them that much more data to improve their models and pass the US in AI capability (if they haven't already).
But also it'll happen in a way, as eventually models will become optimized enough that run cost become more or less negligible from a sustainability perspective.
reply