More

dwaltrip · 2026-03-02T23:46:30 1772495190

Next level:

Have the LLMs generate tests that measure the “ease of use” and “effectiveness” of coding agents using the language.

Then have them use these tests to get data for their language design process.

They should also smoke test their own “meta process” here. E.g. Write a toy language that should be obviously much worse for LLMs, and then verify that the effectiveness tests produce a result agreeing with that.

I await the blog post :)

fcatalan · 2026-03-03T02:23:45 1772504625

Ugh sounds like work, we are vibing here. Or can we also vibe-science? :)

dwaltrip · 2026-03-02T19:06:15 1772478375

Cough factorio cough :)

number6 · 2026-03-02T20:33:54 1772483634

Or kubernetes, the factorio for Ops

joquarky · 2026-03-02T20:33:19 1772483599

As someone with a default mode network that is stuck in the "on' position, that game is the only one that I had to quit playing for my mental health.

dwaltrip · 2026-02-24T04:10:43 1771906243

Please don’t post slop as a comment.

dwaltrip · 2026-02-24T04:08:03 1771906083

You left out the first half of the prompt: “I want to wash my car”.

isatty · 2026-02-24T05:36:34 1771911394

Yeah I see this argument being made that it’s ambiguous for humans. Uh, no? Why on earth would I walk to the car wash when I want to wash my car?

sparky_z · 2026-02-24T05:53:32 1771912412

By the same reasoning, why on earth would a person sincerely ask you that question unless the car that they want to wash is either already at the car wash, or that someone is bringing it to them there for some reason?

If it's as unambiguous as you say, then the natural human response to that question isn't "you should drive there". It's "why are you fucking with me?" Or maybe "have you recently suffered a head injury?"

If you trust that the questioner isn't stupid and is interacting with you honestly, you'd probably just assume that they were asking about an unusual situation where the answer isn't obvious. It's implicitly baked into the premise of the question.

snovv_crash · 2026-02-24T06:52:16 1771915936

The fact that this is so obvious to humans is why there's no training data that LLMs can use to know the answer.

malfist · 2026-02-24T13:41:17 1771940477

How could the car already be at the car wash if you have the option to drive it there?

Maxion · 2026-02-24T15:25:09 1771946709

You might own multiple cars, you might be borrowing someone elses and so forth.

malfist · 2026-02-24T16:21:26 1771950086

That still doesn't make sense. I'm going to use another car, or borrow a car to drive to a carwash where my car I want to wash is and then....I guess leave it there? Or leave the car I came in?

This isn't a viable out for explaining why AI can't "reason" through this.

sparky_z · 2026-02-24T20:16:21 1771964181

But why would they reason through it in that way? You haven't asked them to listen carefully and find the secret reason you're a dumb-ass in order to prove how smart they are. If they default to that mode on every query, that would just make them insufferable conversational partners, which is not the training goal.

Let me put it this way. If you were to prefix the prompts they used with "This is an IQ test: ", I wouldn't be surprised if most of the the models did much better. That would give them the context that the humans reading this article already have.

1718627440 · 2026-02-24T12:11:17 1771935077

You already brought the car there earlier? You bought a new car and negotiated that you get it washed, so you want to collect it? You have a butler? You plan to get someone or something from the car wash to do it at home, because the car you want to wash is dead?

dwaltrip · 2026-02-16T23:52:40 1771285960

It’s quite a difference…

The expected or assumed signal can differ radically from the perceived signal, often in surprising ways.

People spend so much energy doing things based on untrue assumptions about what others are thinking.

And this is before we even get into how much one should adjust their behavior based on someone else’s perception.

simon666 · 2026-02-17T00:16:46 1771287406

Yeah similarly we can make a few distinctions here: 1) Intended signal, true 2) Unintended signal, but true 3) Unintended signal, but false (Sure, 1' intended but false; though not really important here)

When (1) obtains we can describe this situation as one where sender and received coordinate on a message.

When (2) obtains we can say the sender acted in a way that indicative of some fact or other and the received is recognizes this; (2) can obtain when one obtains as a separate signal or when the sender hasn't intended to send a signal.

(3) obtains when the receiver attributes to the sender some expressive behavior or information that is inaccurate, say, because an interpretive schema has characterized the sender and the coding system incorrectly producing an interpretation that is false.

marcus_holmes · 2026-02-17T00:29:52 1771288192

Also remember that each recipient of the signal will have their own reaction to it. What signals professional competence to one person can signal lickspittle corporate toadying to another.

coldtea · 2026-02-17T01:29:15 1771291755

Yes, but in aggregate, most people (or most groups of people) will arrive at the same conclusion for the same signal.

Else signals and signalling wouldn't be a thing and people wouldn't care for them, their reception would be a random scatter plot.

dwaltrip · 2026-02-15T22:28:35 1771194515

What tools do you use for wireframes / how are you generating them?

dwaltrip · 2026-02-10T20:26:41 1770755201

Much better sources of iron are available.

More likely we get smooshed unintentionally as they AIs seek those out.

jacquesm · 2026-02-10T20:48:53 1770756533

We need it all... oh, wait, you're not silicon... sluuuuuurrrrpp...

dwaltrip · 2026-02-10T20:21:09 1770754869

Wow yeah very prescient.

dwaltrip · 2026-02-06T22:22:49 1770416569

It’s clear enough but they aren’t going out of their way to make it obvious. It’s definitely fluffed up / corporately sanitized.

phist_mcgee · 2026-02-06T22:43:54 1770417834

Damage control to limit the rush for the exits?

dwaltrip · 2026-02-05T21:31:51 1770327111

The CEOs aren't here in the comments.

LinXitoW · 2026-02-06T03:31:55 1770348715

Which is why we ought to always bring up their BS every time people try to pretend it didn't happen.

The promises made are ABSOLUTELY relevant to how promising or not these experiments are.

pertymcpert · 2026-02-06T06:58:51 1770361131

I bet you get upset when you buy a new iPhone and don't love it, because Tim Cook said on the ad that they think you're going to love it.

emp17344 · 2026-02-06T15:21:11 1770391271

It cannot be overstated how absurd the marketing campaign for AI was. OpenAI and Anthropic have convinced half the world that AI is going to become a literal god. They deserve to eat a lot of shit for those outright lies.