More

seanmcdirmid · 2026-03-07T15:38:26 1772897906

Ok, I’ll bite: how is that different from humans?

strken · 2026-03-07T15:47:21 1772898441

Human behaviour is goal-directed because humans have executive function. When you turn off executive function by going to sleep, your brain will spit out dreams. Dream logic is famous for being plausible but unhinged.

I have the feeling that LLMs are effectively running on dream logic, and everything we've done to make them reason properly is insufficient to bring them up to human level.

seanmcdirmid · 2026-03-07T16:07:36 1772899656

Isn’t a modern LLM with thinking tokens fairly goal directed? But yes, we hallucinate in our sleep while LLMs will hallucinate details if the prompt isn’t grounded enough.

zarzavat · 2026-03-07T16:17:22 1772900242

The thing about dream logic is that it can be a completely rational series of steps, but there's usually a giant plot hole which you only realise the second you wake up.

This definitely matches my experience of talking to AI agents and chatbots. They can be extremely knowledgeable on arcane matters yet need to have obvious (to humans) assumptions pointed out to them, since they only have book smarts and not street smarts.

tovej · 2026-03-07T16:16:30 1772900190

Assuming this is not a rhetorical question: no, it is not. The only "goal" is to maximize plausibility.

seanmcdirmid · 2026-03-07T16:30:53 1772901053

Again, how is that different from humans? I’m not going around trying to prove my code correct when I write it manually.

satvikpendem · 2026-03-07T15:56:46 1772899006

A prompt for an LLM is also a goal direction and it'll produce code towards that goal. In the end, it's the human directing it, and the AI is a tool whose code needs review, same as it always has been.

whoamii · 2026-03-07T15:56:43 1772899003

Some of my best code comes from my dreams though.

tsunamifury · 2026-03-07T15:59:28 1772899168

It’s amazing how much you get wrong here. As LLM attention layers are stacked goal functions.

What they lack is multi turn long walk goal functions — which is being solved to some degree by agents.

nemo44x · 2026-03-07T15:58:43 1772899123

LLMs are literally goal machines. It’s all they do. So it’s important that you input specific goals for them to work towards. It’s also why logically you want to break the problem into many small problems with concrete goals.

andai · 2026-03-07T16:00:38 1772899238

Do you only mean instruct-tuned LLMs? Or the base (pretrained) model too?

spiderfarmer · 2026-03-07T15:57:09 1772899029

And yet LLM’s are incredibly useful as they are right now.

detourdog · 2026-03-07T16:27:40 1772900860

What I'm surprises me about the current development environment is the acceleration of technical debt. When I was developing my skills the nagging feeling that I didn't quite understand the technology was a big dark cloud. I felt this clopud was technical debt. This was always what I was working against.

I see current expectations that technical debt doesn't matter. The current tools embrace superficial understand. These tools to paper over the debt. There is no need for deeper understanding of the problem or solution. The tools take care of it behind the scenes.

wood_spirit · 2026-03-07T15:48:57 1772898537

It’s not. LLMs are just averaging their internet snapshot, after all.

But people want an AI that is objective and right. HN is where people who know the distinction hang out, but it’s not what the layperson things they are getting when they use this miraculous super hyped tool that everybody is raving about?

mrwh · 2026-03-07T16:06:24 1772899584

The etiquette, even at the bigtech place I work, has changed so quickly. The idea that it would be _embarrassing_ to send a code review with obvious or even subtle errors is disappearing. More work is being put on the reviewer. Which might even be fine if we made the further change that _credit goes to the reviewer_. But if anything we're heading in the opposite direction, lines of code pumped out as the criterion of success. It's like a car company that touts how _much_ gas its cars use, not how little.

wood_spirit · 2026-03-07T16:11:25 1772899885

Review is usually delegated to an AI too

satvikpendem · 2026-03-07T15:53:36 1772898816

By now, a few years after ChatGPT released, I don't think anyone is thinking AI is objective and right, all users have seen at least one instance of hallucination and simply being wrong.

wood_spirit · 2026-03-07T15:59:23 1772899163

Sorry I can think of so many counter examples. I also detect a lot of “well it hallucinates about subject X (that the person knows well, so can spot the hallucination)” but continue to trust it on subjects Y and Z (which the person knows less well so can’t spot the hallucinations).

YMMV.

andai · 2026-03-07T16:02:21 1772899341

> Briefly stated, the Gell-Mann Amnesia effect works as follows. You open the newspaper to an article on some subject you know well. In Murray's case, physics. In mine, show business. You read the article and see the journalist has absolutely no understanding of either the facts or the issues. Often, the article is so wrong it actually presents the story backward-reversing cause and effect. I call these the "wet streets cause rain" stories. Paper's full of them. In any case, you read with exasperation or amusement the multiple errors in a story-and then turn the page to national or international affairs, and read with renewed interest as if the rest of the newspaper was somehow more accurate about far-off Palestine than it was about the story you just read. You turn the page, and forget what you know.

-Michael Crichton

satvikpendem · 2026-03-07T16:04:43 1772899483

Sure, Gell-Mann amnesia exists, but remember that its origin is actually human, in the form of newspaper writers. So, how can we trust humans the same way? In just the same way, AI cannot also be fully trusted.

wood_spirit · 2026-03-07T16:15:18 1772900118

The current way of doing AI cannot be trusted.

that doesn’t mean the future won’t herald a way of using what a transformer is good at - interfacing with humans - to translate to and interact with something that can be a lot more sound and objective.

satvikpendem · 2026-03-07T16:29:03 1772900943

You're falling into the extrapolation fallacy, there is no reason to think that the future won't have the same issues as today in terms of hallucinations.

And even if they were solved, how would that even work? The world is not sound and objective.

seanmcdirmid · 2026-03-07T16:08:50 1772899730

There are a lot of binary thinkers on HN, but they shouldn’t make up a majority.

rDr4g0n · 2026-03-07T15:57:16 1772899036

It's much easier to fire an employee which produces low quality/effort work than to convince leadership to fire Claude.

satvikpendem · 2026-03-07T16:12:17 1772899937

You can fire employees who don't review code generated though, because ultimately it's their responsibility to own their code, whether they hand wrote it or an LLM did.

It seems to me that it's all a matter of company culture, as it has always been, not AI. Those that tolerate bad code will continue to tolerate it, at their peril.

apical_dendrite · 2026-03-07T15:49:54 1772898594

The volume is different. Someone submitted a PR this week that was 3800 lines of shell script. Most of it was crap and none of it should have been in shell script. He's submitting PRs with thousands of lines of code every day. He has no idea how any of it actually works, and it completely overwhelms my ability to review.

Sure, he could have submitted a ill-considered 3800 line PR five years ago, but it would have taken him at least a week and there probably would have been opportunities to submit smaller chunks along the way or discuss the approach.

switchbak · 2026-03-07T16:19:11 1772900351

It’s harder when the person doing what you describe has the ability to have you fired. Power asymmetry + irresponsible AI use + no accountability = a recipe for a code base going right to hell in a few months.

I think we’re going to see a lot of the systems we depend on fail a lot more often. You’d often see an ATM or flight staus screen have a BSOD - I think we’re going to see that kind of thing everywhere soon.

satvikpendem · 2026-03-07T15:54:04 1772898844

Just block that user, that seems to be the way.

somewhereoutth · 2026-03-07T15:50:03 1772898603

Humans have a 'world model' beyond the syntax - for code, an idea of what the code should do and how it does it. Of course, some humans are better than others at this, they are recognized as good programmers.

satvikpendem · 2026-03-07T15:54:29 1772898869

Papers show that AI also has a world model, so I don't think that's the right distinction.

tovej · 2026-03-07T16:20:40 1772900440

Could you please cite these papers. If by AI you mean LLMs, that is not supported by what I know. If you mean a theoretical world-model-based AI, that's just a tautological statement.

satvikpendem · 2026-03-07T16:31:03 1772901063

https://arxiv.org/abs/2305.11169

https://arxiv.org/abs/2506.02996

seanmcdirmid · 2026-03-07T05:42:09 1772862129

> My experience is that people who weren't very good at writing software are the ones now "most excited" to "create" with a LLM.

My experience is the opposite. Those with a passion for the field and the ability to dig deeply into systems are really excited right now (literally all that power just waiting to be guided to do good...and oh does it need guidance!). Those who were just going through the motions and punching a clock are pretty unmotivated and getting ready to exit.

Sometimes I dream about being laid off from my FAANG job so I have some time to use this power in more interesting than I'm doing at work (although I already get to use it in fairly interesting ways in my job).

abm53 · 2026-03-07T12:30:28 1772886628

I wouldn’t say the pessimists fall into that category.

In my experience they are mostly the subset of engineers who enjoyed coding in and of itself and ——in some cases—— without concern for the end product.

seanmcdirmid · 2026-03-07T05:30:21 1772861421

I'm using an LLM to write queries ATM. I have it write lots of tests, do some differential testing to get the code and the tests correct, and then have it optimize the query so that it can run on our backend (and optimization isn't really optional since we are processing a lot of rows in big tables). Without the tests this wouldn't work at all, and not just tests, we need pretty good coverage since if some edge case isn't covered, it likely will wash out during optimization (if the code is ever correct about it in the first place). I've had to add edge cases manually in the past, although my workflow has gotten better about this over time.

I don't use a planner though, I have my own workflow setup to do this (since it requires context isolated agents to fix tests and fix code during differential testing). If the planner somehow added broad test coverage and a performance feedback loop (or even just very aggressive well known optimizations), it might work.

seanmcdirmid · 2026-03-07T03:40:11 1772854811

They pay use and payroll taxes. Actually, just moving more of the tax burden over to graduated income and wealth taxes would go a long way to equality. Just taking health insurance into the public system would help. There are lots of easy things we can do, millionaire taxation is kind of just a distraction/strawman.

seanmcdirmid · 2026-03-06T19:31:53 1772825513

> Juniors are still getting hired because they're still way cheaper and they're just as capable as using AI as anyone.

Seniors have much more advantage right now in using AI than Juniors. Seniors get to lean in on their experience in checking AI results. Juniors rely on the AI's experience instead, which isn't as useful.

seanmcdirmid · 2026-03-06T07:25:49 1772781949

China looks like the good guy now, but if Xi decided to “reassert control” over Taiwan, it would quickly become an international pariah and everyone would forget about Trump immediately, the country would immediately be isolated from everyone other than their closest (geographically speaking) allies. Is China ready to do that? Not today, maybe in a decade or two (when they’ve replaced the USA as the top economic/military power, there won’t be severe consequences). Xi is smart enough to wait, taking Taiwan now wins them nothing and loses them everything.

gzread · 2026-03-06T13:36:30 1772804190

We'd just cut off all of our goods manufacturing and leave the shelves empty? I don't think it's likely.

seanmcdirmid · 2026-03-06T17:59:56 1772819996

> We'd just cut off all of our goods manufacturing and leave the shelves empty? I don't think it's likely.

All bets are off if China attacks Taiwan now, I think, it would be hard but there would be a response like that. In a decade or two, probably not, but more due to China's dominance in the world by that point rather than just their ability to make things clout.

Xi isn't dumb, he isn't going to stir the pot right now, he doesn't have to, China doesn't have much to gain from it. China has nothing but patience.

seanmcdirmid · 2026-03-05T14:50:37 1772722237

They could have used inside government legal analysis that other people didnt have. You could have predicted this with higher certainty if you knew the justices well enough.

NetMageSCW · 2026-03-05T16:02:56 1772726576

Coulda woulda shoulda.

They could have just been smarter than average and found an angle others didn’t see that paid off for them.

seanmcdirmid · 2026-03-05T17:47:49 1772732869

Ya, that's why this will be impossible to show off as insider trading.

seanmcdirmid · 2026-03-04T22:01:19 1772661679

They already kind of do, but I think anyone who was into US money has already left for it, and the money China is throwing at the problem is pretty good also. You can also have a lot more influence in a Chinese company without having to adopt a weird new American corporate culture.

seanmcdirmid · 2026-03-03T15:48:27 1772552907

You can get a 27 inch 5k from Asus for $750. A 31.5 inch 6K goes for around $1200. A 28 inch 4K is around $350-$400.

nebula8804 · 2026-03-04T03:52:42 1772596362

Anyone reading this I am begging to please thoroughly test anything that comes out of ASUS before committing. Maybe only purchase with a generous return policy and possibly insurance. They are decent panels but everything around the panel is horrendous. Random connection errors with different machines, poor UX for switching inputs, takes a millenium to boot up and connect to the screen, forget about any support, if you have built in speakers you'd be better off with a tin can connected to your computer.

You get what you pay for with ASUS.

seanmcdirmid · 2026-03-04T07:09:11 1772608151

I’ve frankly have had worse experience with Samsung and better experiences with LG. The model I have is pretty bare bones, which is much better than the Samsung 27 inch 5k I had that just died on me after a couple of years. The LG 28 inch 4k is going on its 6th year. I think if I buy a 6K, I’ll wait for the LG to come down in price a bit ($2k for LG vs $1300 for Asus on Amazon).

nebula8804 · 2026-03-04T08:35:04 1772613304

They all suck in their own ways. In my experience LG, has random hardware failures (like one audio channel just dying how? I dont know), still kinda slow booting but this has gotten better, and their designs can be hit or miss(terrible stands, aesthetics are not ergonomic enough etc.). Samsung has been better for me but suffers from variations of the above.

These brands all have glowing fans online pushing their products(the flamewars about ASUS made me even hesitate to comment) but they burn their reputations customer by customer and I guess enough have been burned that Apple is able to maintain enough sales.

seanmcdirmid · 2026-03-03T15:43:40 1772552620

You’ll see a lot of MacBooks in Beijing’s zhongguangcun where all the tech companies are, but they also have a lot of students there as well, so who knows. You need to go out to the suburbs where Lenovo has offices to stop seeing them. I know Apple is common in Western Europe having lived there for two years (but that was 20 years ago, I lived in China for 9 years after that).

It wouldn’t surprise me if the deepseek people were primarily using Mac’s. Maybe Alibaba might be using PCs? I’m not sure.

AdamN · 2026-03-04T08:14:03 1772612043

I would also expect that the Deepseek devs are using MacBook. If not they may be using Linux - Windows is possible of course but not likely imho. I have no knowledge about that area though so would be interesting to here any primary sources or anecdotes.

seanmcdirmid · 2026-03-04T22:16:00 1772662560

Deepseek is in Hangzhou, so I guess they are. GDP/capita in Zhejiang is pretty high, even more so for HZ. If you ever visit, it feels like a pretty nice place (especially if you can get a villa around xihu). I also visited ZJU once, and it was pretty Macbooky, but I don't have as much experience there as Beijing's Zhongguancun.