This became clear to me over the last few years. We are quickly returning to a world of entrenched social hierarchy where there are lords and peasants and little room even for social mobility.
With the corpse of meritocracy too rotted to deny at this point the elite simply seem to have run out of lies for placating the people.
Or, more likely the people are so sickeningly impotent, that’s there’s no need for the lies anymore. The new aristocracy will prevail over liberalism and everything the west lied of being part of the their values for years.
The west had been fighting this since it's founding.
“If we are to have another contest in the near future of our national existence, I predict that the dividing line will not be Mason and Dixon's but between patriotism and intelligence on the one side, and superstition, ambition and ignorance on the other.”
― Ulysses S. Grant
No. Not once in the entire history of the human race, from the time we were dwelling in caves to today, not in any tribe, village, hamlet, city, state, kingdom or nation, in no culture or circumstance, has effort ever been rewarded.
It's weird that homo sapiens sapiens has been around for approximately 300,000 years and it's never happened once. Not even once.
Everyone knows someone who worked for years on a project only for it to go nowhere. Pour years into a business that failed. Spend years getting a degree that was useless. Effort might be a part of many people's success stories, but it's not the thing that literally gets rewarded. And conversely, many people get rewarded for things that require relatively little effort.
I suppose I should have said that the correlation between effort and reward has never been 1.0 and has often been a lot lower than we like to believe.
>The city is temperate and brightly colored, with plenty of pleasant trees, but on every corner it speaks to you in an aggressively alien nonsense. Here the world automatically assumes that instead of wanting food or drinks or a new phone or car, what you want is some kind of arcane B2B service for your startup. You are not a passive consumer. You are making something.
I recently traveled to San Francisco and as an outsider this was pretty much the reaction I had.
I've been to SF three times, and each time the oddest thing was going down 101 from the airport and seeing cURL commands and "you sped past that just like we sped past Snowflake" and such on billboards. It's like being on another planet where everyone is at work.
(on the other hand, in DC there's ads on the metro for new engine upgrades for fighter jets, and i've gotten used to that.)
I visited L.A. in 2023 and the thing that shocked me was how many billboards were for products that I only ever heard advertised on podcasts. MeUndies, for example.
I think that I shall never see
A billboard lovely as a tree
Indeed, unless the billboards fall
I’ll never see a tree at all.
Song of the Open Road - Ogden Nash
TBH I wouldn't mind if my LLM threw in an "Inshallah" every now and again, it would remind me how skeptical I need to be in its output. (Not just "Inshallah" - same thing if it said "God willing")
"... over time people become increasingly disinterested in others."
The average person perhaps. I find as I get older that people become more fascinating to me. Maybe I've just gotten better at listening and identifying interesting things about them.
Would agree wholeheartedly with this. Once you drill down into a person, you will eventually find an aspect of them that approaches life in a way you do not, and in a way which increases your appreciation for the depth of human experience if you listen closely enough. The signals the author are clued in on here are superficial to me. Idiosyncratic consumptions, a controversial political take or two? Sure, those can tickle one's curiosity, but they are only entrances to possible points of uniqueness and can be easily faked. Obviously you can't know everyone, nor should you want to, so these are just proxies the author uses to find people they want to spend their limited time with rather than in my opinion actual "not-boring" people.
All the people responding saying "You would never ask a human a question like this" - this question is obviously an extreme example. People regularly ask questions that are structured poorly or have a lot of ambiguity. The point of the poster is that we should expect that all LLM's parse the question correctly and respond with "You need to drive your car to the car wash."
People are putting trust in LLM's to provide answers to questions that they haven't properly formed and acting on solutions that the LLM's haven't properly understood.
And please don't tell me that people need to provide better prompts. That's just Steve Jobs saying "You're holding it wrong" during AntennaGate.
This reminds me of the old brain-teaser/joke that goes something like 'An airplane crashes on the boarder of x/y, where do they bury the survivors?' The point being that this exact style of question has real examples where actual people fail to correctly answer it. We mostly learn as kids through things like brain teasers to avoid these linguistic traps, but that doesn't mean we don't still fall for them every once in a while too.
That’s less a brain teaser than running into the error correction people use with language. This is useful when you simply can’t hear someone very well or when the speaker makes a mistake, but fails when language is intentionally misused.
> This is useful when you simply can’t hear someone very well or when the speaker makes a mistake
I have a few friends with pretty heavy accents and broken English. Even my partner makes frequent mistakes as a non native English speaker. It's made me much better at communicating but it's also more work and easier for miscommunication to happen. I think a lot of people don't realize this also happens with variation in culture. So even within people speaking the same language. It's just that the accent serves as a flag for "pay closer attention". I suspect this is a subtle but contributing problem to miscommunication on the and why fights are so frequent.
I'm actually having a hard time interpreting your meaning.
Are you criticizing LLMs? Highlighting the importance of this training and why we're trained that way even as children? That it is an important part of what we call reasoning?
Or are you giving LLMs the benefit of the doubt, saying that even humans have these failure modes?[0]
Though my point is more that natural language is far more ambiguous than I think people give credit to. I'm personally always surprised that a bunch of programmers don't understand why programming languages were developed in the first place. The reason they're hard to use is explicitly due to their lack of ambiguity, at least compared to natural languages. And we can see clear trade offs with how high level a language is. Duck typing is both incredibly helpful while being a major nuisance. It's the same reason even a technical manager often has a hard time communicating instructions. Compression of ideas isn't very easy
[0] I've never fully understood that argument. Wouldn't we call a person stupid for giving a similar answer? How does the existence of stupid mean we can't call LLMs stupid? It's simultaneously anthropomorphising while being mechanistic.
I was pointing out humans and LLMs have this failure mode so in a lot of ways it is no big deal/not some smoking gun that LLMs are useless and dangerous, or at least no more useless and dangerous than humans.
I personally would stay away from calling someone, or an LLM, 'stupid' for making this mistake because of several reasons. First, objectively intelligent high functioning people can and do mistakes similar to this so a blanket judgement of 'stupid' is pretty premature based on a common mistake. Second, everything is a probability, even in people. That is why scams work on security professionals as well as on your grandparents. The probability of a professional may be 1 in 10k while on your grandparents it may be 1/100 but that just means that the professional needs to get a lot more phishing attempts thrown at them before they accidentally bite. Someone/something isn't stupid for making a mistake, or even systemically making a mistake, everyone has blind spots that are unique to them. The bar for 'stupid' needs to be higher.
There are a lot of 'gotcha' articles like this one that point out some big mistake an LLM made or systemic blind spot in current LLMs and then conclude, or at least heavily imply, LLMs are dangerous and broken. If the whole world put me under a microscope and all of my mistakes made the front page of HN there would be no room left for anything other than documentation of my daily failures (the front page would really need to grow to just keep up with the last hour worth of mistakes more than likely).
I totally agree with the language ambiguity point. I think that is a feature and not a bug. It allows creativity to jump in. You say something ambiguous and it helps you find alternative paths to go down. It helps the people you are talking to also discover alternative paths more easily. This is really important in conflicts since it can help smooth over ill intentions since both sides can try to find ways of saying things that bridge their internal feelings with the external reality of dialogue. Finally, we often really don't know enough but we still need to say something and like gradient descent, an ambiguous statement may take us a step closer to a useful answer.
> I personally would stay away from calling someone, or an LLM, 'stupid' for making this mistake because of several reasons.
I wouldn't. Because there's a difference between calling someone's action stupid and saying that someone is stupid. These are entirely dependent upon the context of the claim. Smart people frequently do stupid stuff. I have a PhD and by some metric that makes me "smart" but you'll also see me do plenty of stupid stuff every single day. Language is fuzzy...
But I think responses like yours are entirely dismissive at what's being attempted to be shown. What's being shown is how easily they are fooled. Another popular example right now being the cup with a sealed top and open bottom (lol "world model"?).
> There are a lot of 'gotcha' articles
The point isn't about getting some gotcha, it is about a clear and concise example of how these systems fail.
What would not be a clear and concise example is showing something that requires domain subject expertise. That's absolutely useless as an example to everyone that isn't a subject matter expert.
The point of these types of experiments is to make people think "if they're making these types of errors that I can easily tell are foolish then how often are they making errors where I am unable to vet or evaluate the accuracy of its outputs?" This is literally the Gell-Mann Amnesia Effect in action[0].
> I totally agree with the language ambiguity point. I think that is a feature and not a bug.
So does everybody. But there are limits to natural language and we've been discussing them for quite a long time[1]. There is in fact a reason we invented math and programming languages.
> Finally, we often really don't know enough but we still need to say something and like gradient descent, an ambiguous statement may take us a step closer to a useful answer.
Was this sentence an illustrative example?
Sometimes I think we don't need to say something. I think we all (myself included) could benefit more by spending a bit longer before we open our mouths, or even not opening them as often. There's times where it is important to speak out but there are also times that it is important to not speak. It is okay to not know things and it is okay to not be an expert on everything.
> This is literally the Gell-Mann Amnesia Effect in action.
Absolutely! But there is some nuance, here. The failure mode is for an ambiguous question, which is an open research topic. There is no objectively correct answer to "Should I walk or drive?" given the provided constraints.
Because handling ambiguities is a problem that researchers are actively working on, I have confidence that models will improve on these situations. The improvements may asymptotically approach zero, leading to ever increasingly absurd examples of the failure mode. But that's ok, too. It means the models will increase in accuracy without becoming perfect. (I think I agree with Stephen Wolfram's take on computationally irreducibility [1]. That handling ambiguity is a computationally irreducible problem.)
EWD was right, of course, and you are too for pointing out rigorous languages. But the interactivity with an LLM is different. A programming language cannot ask clarifying questions. It can only produce broken code or throw a compiler error. We prefer the compiler errors because broken code does not work, by definition. (Ignoring the "feature not a bug" gag.)
Most of the current models are fine-tuned to "produce broken code" rather than "compiler error" in these situations. They have the capability of asking clarifying questions, they just tend not to, because the RL schedule doesn't reward it.
Producing fewer "Compiler errors" and more "broken code errors" is a fundamental failure. The cost of detecting compiler errors is lower than detecting broken code. If the cost of detecting and fixing broken code increases at the same rate as LLMs "improve" then their net benefit will remain fixed. I asked my five year old the above "brain teaser" and he got it right. I did a follow up of what should he wash at a car wash if he walked there, he said, "my hands." Chat answered with more giberish.
I agree it is a fundamental failure of the current state of models. I believe it is solvable. The nuance is just that "solving" the problem might not look like what we think of as a solution. Hence the asymptote.
> All the people responding saying "You would never ask a human a question like this"
That's also something people seem to miss in the Turing Test thought experiment. I mean sure just deceiving someone is a thing, but the simplest chat bot can achieve that. The real interesting implications start to happen when there's genuinely no way to tell a chatbot apart.
But it isn't just a brain-teaser. If the LLM is supposed to control say Google Maps, then Maps is the one asking "walk or drive" with the API. So I voice-ask the assistant to take me to the car wash, it should realize it shouldn't show me walking directions.
The problem is that most LLM models answer it correctly (see the many other comments in this thread reporting this). OP cherry picked the few that answered it incorrectly, not mentioning any that got it right, implying that 100% of them got it wrong.
You can see up-thread that the same model will produce different answers for different people or even from run to run.
That seems problematic for a very basic question.
Yes, models can be harnessed with structures that run queries 100x and take the "best" answer, and we can claim that if the best answer gets it right, models therefore "can solve" the problem. But for practical end-user AI use, high error rates are a problem and greatly undermine confidence.
The magic of LLMs is that one llm can learn everything and then we can clone it. However, if we don't know ahead of time which one will be the best one, then we should probably keep a lot of version with real (mathematically calculated) diversity. Ironically, the DEI peeps were right all along.
My understanding is that it mainly fails when you try it in speech mode, because it is the fastest model usually. I tried yesterday all major providers and they were all correct when I typed my question.
Nay-sayers will tell you all OpenAI, Google and Anthropic 'monkeypatched' their models (somehow!) after reading this thread and that's why they answer it correctly now.
You can even see those in this very thread. Some commenters even believe that they add internal prompts for this specific question (as if people are not attempting to fish ChatGPT's internal prompts 24/7. As if there aren't open weight models that answer this correctly.)
Exactly! The problem isn't this toy example. It's all of the more complicated cases where this same type of disconnect is happening, but the users don't have all of the context and understanding to see it.
I recently asked an AI a chemistry question which may have an extremely obvious answer. I never studied chemistry so I can't tell you if it was. I included as much information about the situation I found myself in as I could in the prompt. I wouldn't be surprised if the ai's response was based on the detail that's normally important but didn't apply to the situation, just like the 50 meters
If you're curious or actually knowledgeable about chemistry, here's what happened.
My apartment's dishwasher has gaps in the enamel from which rust can drip onto plates and silverware. I tried soaking but I presume to be a stainless steel knife with a drip of rust on it in citric acid. The rust turned black and the water turned a dark but translucent blue/purple.
I know nothing about chemistry. My smartest move was to not provide the color and ask what the color might have been. It never guessed blue or purple.
In fact, it first asked me if this was highschool or graduate chemistry. That's not... and it makes me think I'll only get answers to problems that are easily graded, and therefore have only one unambiguous solution
I'm a little confused by your question myself. Stainless steel rust should be that same brown color. Though it can get very dark when dried. Blue is weird but purple isn't an uncommon description, assuming everything is still dark and there's lots of sediment.
But what's the question? Are you trying to fix it? Just determine what's rusting?
Thanks, Excellent catch! Everyone is saying this is a "brain teaser." However, this reminded me of the LLM that thought it was the golden gate bridge. I hadn't been able to say it (or think it) succinctly. From Claude, "when we turn up the strength of the “Golden Gate Bridge” feature, Claude’s responses begin to focus on the Golden Gate Bridge. Its replies to most queries start to mention the Golden Gate Bridge, even if it’s not directly relevant." Here's the link for those interested. https://www.anthropic.com/news/golden-gate-claude
> All the people responding saying "You would never ask a human a question like this"
It would be interesting to actually ask a group a people this question. I'm pretty sure a lot of people would fail.
It feels like one of those puzzles which people often fail. E.g: 'Ten crows are sitting on a power line. You shoot one. How many crows are left to shoot?' People often think it's a subtraction problem and don't consider that animals flee after gunshots. (BTW, ChatGPT also answers 9.)
>People regularly ask questions that are structured poorly or have a lot of ambiguity.
The difference between someone who is really good with LLM's and someone who isn't is the same as someone who's really good with technical writing or working with other people.
Communication. Clear, concise communication.
And my parents said I would never use my English degree.
Other leading LLMs do answer the prompt correctly. This is just a meaningless exercise in kicking sand in OpenAI's face. (Well-deserved sand, admittedly.)
To quote you from an earlier comment of yours: "This is exactly the sensational take (devoid of nuance and information) that we should collectively push back against."
The left has been traditionally anti-capitalist and in favor of improving rights and living conditions. Who on the left is gaining financially from distorting the truth to the level of someone like Larry Ellison, Elon Musk, Peter Thiel, the Koch Brothers, or Jeff Bezos?
In the favor of improving living conditions? If and only if it is through their ideology. If it is through something which goes against their ideology the goalposts move at the speed of light and out comes some rationalization like "it wasn't really important, what we need is community".
When communism had claims to being more productive growth was the most important thing in the world and why we should adopt their ideology. Now look at the 'degrowth' people what a coincidence and they are literally arguing for worsening living conditions....
That's because Marx was a productivist, and communism was strictly productivist until a decade ago (and still is mainly productivist nowadays).
In a lot of country political ecology used to be liberal/capitalist (save a few radical feminist like D'Eaubonne who linked environment with feminism, but it's less than a minority). Basically Blair's 'third way' but with less nuclear (for some reason, although I think this position is loosing ground in ecologist), and more electric cars.
The degrowth movement is an offshot of that ideology. Degrowth is to political ecologism what anarchism is to communism, based a very Idealist and hopeful view of humanity.
Communists and ecologists are broadly on the 'left', but rarely allied until maybe a decade ago, and again, on minor things (Communists love nuclear,as it is typically something you don't want a capitalist with 'limited liability' to take care of), and while degrowth might be close to anarchists in some way, it is very, very dishonest to put them in the same basket.
Policing SOPs in East Asia (incl. Singapore) is different than policing SOPs in the west. Typically people are warned, often multiple times, that they are in danger of experiencing the less kind side of local law. Once the switch is flipped, this gentle hand becomes an iron fist.
I will bet dollars to donuts that the person who was held without charge for decades (mentioned above) was completely not surprised that they were severely punished. They may not have liked the punishment, they may not have agreed with the opaque process, but they almost certainly can’t say that they didn’t know it was coming.
You quoted "reasonable", but nothing what you said has any effect on reasonableness.
If someone warns you that they're going to murder you if you post another 5 comments on HackerNews, and keeps you up-to-date with every comment you make, nothing about those warnings makes the subsequent murder after your 5th comment more reasonable than if they hadn't given those warnings.
> You quoted "reasonable", but nothing what you said has any effect on reasonableness.
Being notified that you are or have been breaking the law and being told that there will be severe consequences if you don’t stop seems reasonable to me.
It may not be how we do it in the west, but it’s hard to argue that this can’t be perceived as reasonable.
Let me give you an example that opened my eyes. It’s one of many, but it’s one that you may have heard of.
Michal Fay was caned in Singapore in the 1990s. I was so put off by this, that I swore never to go to Singapore. I thought that the punishment far exceeded that which could be justified by the crimes he committed (petty stuff like vandalizing cars).
Then, within a 6 month period, I met two families who lived as expatriates in Singapore at the same time, one in the same community.
They all said that MF was a pariah. They also both said that he and his family had been given gradually escalating warnings over a short period of time, with the next to last one being “MF needs to leave Singapore now”, and the last one being “you (his family) and MF need to leave Singapore now”. Apparently the job was too good, so the family stayed. We know the rest of the story.
A decade later, I met a woman who worked in Singapore at the time, and she expressed similar sentiments.
While I still think the punishment was excessive (even reduced to 4 lashings instead of 6), I lost all pity for MF and his family. They knew what was coming, and they either didn’t understand the culture they were in, or they didn’t believe what they were told.
I’ve see similar types of policing (with warnings and an explanation of potential consequences) happen in Japan, China, and South Korea. IMHO, it works the way they want it to (mostly as an early deterrent, with very little prosecution actually taking place). This is one reason why there is such a high success rates of criminal convictions in places like Japan — if they make the effort to book you, they have overwhelming evidence, usually collected when the criminal has been warned.
We may not like the laws, we may not like the punishments, but we shouldn’t be surprised by the outcomes.
> If someone warns you that they're going to murder you if you post another 5 comments on HackerNews, and keeps you up-to-date with every comment you make, nothing about those warnings makes the subsequent murder after your 5th comment more reasonable than if they hadn't given those warnings.
Great strawman.
Posting on HN is not against the law (at least where I am).
> Posting on HN is not against the law (at least where I am).
If my scenario were law, it would still be entirely unreasonable.
Besides, law is an entirely fluid concept in general, including in Singapore, but really anywhere, to varying degrees. The US has recently been making this incredibly clear for even the blindest to see, but it was already the case before.