jfc. I don't have anything to say to this other than that it deserves calling out.
> You've never seen (or implemented) an A/B test where the test was whether to improve the way e.g. the invoicing software generates PDFs?
I have never in my life seen or implemented an a/b test on a tool used by professionals. I see consumer-facing tests on websites all the time, but nothing silently changing the software on your computer. I mean, there are mandatory updates, which I do already consider to be malware, but those are, at least, not silent.
Why are you calling it out? You are interpreting the statement too literally. The point is probably about behavior, not nature. LLMs do not always produce identical outputs for identical prompts, which already makes them less like deterministic machines and superficially closer to humans in interaction. That is it. The comparison can end here.
They actually can, though. The frontier model providers don't expose seeds, but for inferencing LLMs on your own hardware, you can set a specific seed for deterministic output and evaluate how small changes to the context change the output on that seed. This is like suggesting that Photoshop would be "more like a person than a machine" if they added a random factor every time you picked a color that changed the value you selected by +-20%, and didn't expose a way to lock it. "It uses a random number generator, therefore it's people" is a bit of a stretch.
You are right, I was wrong. I think anthropomorphizing LLMs to begin with is kind of silly. The whole "LLMs are closer to people than to machines" comparison is misleading, especially when the argument comes down to output variability.
Their outputs can vary in ways that superficially resemble human variability, but variability alone is a poor analogy for humanness. A more meaningful way to compare is to look at functional behaviors such as "pattern recognition", "contextual adaptation", "generalization to new prompts", and "multi-step reasoning". These behaviors resemble aspects of human capabilities. In particular, generalization allows LLMs to produce coherent outputs for tasks they were not explicitly trained on, rather than just repeating training data, making it a more meaningful measure than randomness alone.
That said, none of this means LLMs are conscious, intentional, or actually understanding anything. I am glad you brought up the seed and determinism point. People should know that you can make outputs fully predictable, so the "human-like" label mostly only shows up under stochastic sampling. It is far more informative to look at real functional capabilities instead of just variability, and I think more people should be aware of this.
What other tool can I have a conversation with? I can't talk to a keyboard as if it were a coworker. Consider this seriously, instead of just letting your gut reaction win. Coding with claude code is much closer to pair programming than it is to anything else.
You could have a conversation with Eliza, SmarterChild, Siri, or Alexa. I would say surely you don't consider Eliza to be closer to person than machine, but then it takes a deeply irrational person to have led to this conversation in the first place so maybe you do.
Not productive conversations. If you had ever made a serious attempt to use these technologies instead of trying to come up with excuses to ignore it, you would not even think of comparing a modern LLM coding agent to some gimmick like Alexa or ELIZA. Seriously, get real.
Not only have I used the technology, I've worked for a startup that serves its own models. When you work with the technology, it could not be more obvious that you are programming software, and that there is nothing even remotely person-like about LLMs. To the extent that people think so, it is sheer ignorance of the basic technicals, in exactly the same way that ELIZA fooled non-programmers in the 1960s. You'd think we'd have collectively learned something in the 60 years since but I suppose not.
I really don't care where you've worked, to seriously argue that LLMs aren't more capable of conversation than ELIZA, aren't capable of pair programming even, is gargantuan levels of cope.
I didn't make any claims about their utility. I said that they are not like people. They are machines through and through. Regular software programs. Programs that are, I suppose, a little bit too complex for the average human to understand, so now we have the Eliza effect applying to an entirely new generation.
"I had not realized ... exposures to a relatively simple computer program could induce powerful delusional thinking in quite normal people." -- Eliza's creator
I would doubt that they are just “regular software programs” as explainable ai (or other statistical tracing) has been lagging far behind.
If this is the case and the latest models can be explained through their weights and settings, please link it. I would like to see explainable ai up and coming.
jfc. I don't have anything to say to this other than that it deserves calling out.
> You've never seen (or implemented) an A/B test where the test was whether to improve the way e.g. the invoicing software generates PDFs?
I have never in my life seen or implemented an a/b test on a tool used by professionals. I see consumer-facing tests on websites all the time, but nothing silently changing the software on your computer. I mean, there are mandatory updates, which I do already consider to be malware, but those are, at least, not silent.