> Anthropic’s Super Bowl debut, which drew close attention from rivals, illustrated the risks of that convergence. By centering its message on promising not to include ads on Claude, which OpenAI announced it would begin testing earlier this year. The company made a bet that its ad free principles would resonate with audiences more than product familiarity.
> That wager came amid a significant awareness gap. According to an S&P Global 2025 survey of 1,149 respondents, 73% reported using ChatGPT, followed by Google’s Gemini at 41%. Tools like Perplexity and Claude registered in the single digits, with only 7% of respondents saying they use Claude.
> “The vast majority of consumers don’t know what Claude is. Many people aren’t going to really get the whole ‘Oh, we’re comparing ourselves to chatGPT,’” said Williamson.
It's helpful to keep in mind that 'AI Twitter' is a bubble. Most people just don't have that many 'important' notes and calendar items.
People saying 'Claude is now managing my life!11' are like gearheads messing with their carburetor or (closer to this analogy) people who live out of Evernote or Roam
All that said I've been thinking for a while that tool use and discrete data storage like documents/lists etc will unlock a lot of potential in AI over just having a chatbot manipulating tokens limited to a particular context window. But personal productivity is just one slice of such use cases
It's interesting to think about how complex the wikipedia text is compared to something like github flavored markdown or even standard html tables (although I guess it eventually renders into standard html so it's not more complex than the latter when all other html elements are considered in addition to <table>)
For example the swatch internet time infobox is dynamically updated
{{short description|Alternate time system by watch maker Swatch}}
{{Infobox
| image = [[File:Swatch beat Logo.svg|200px|alt=Logo of Swatch Internet Time]]
| caption = Logo of Swatch Internet Time
| title = Time{{efn|at page generation }} {{purge|(update to view correct time)}}
| label1 = 24-hour time (UTC)
| data1 = {{nowrap|{{#time:H:i:s}}}}
| label2 = 24-hour time (CET)
| data2 = {{Time|CET|dst=no|df-cust=H:i:s|hide-refresh=yes}}
| label3 = .beat time (BMT)
| data3 = {{nowrap|@{{#expr: floor( {{#expr:{{#expr:{{#expr:{{#time:H|now + 1 hour}}3600}}+{{#expr:{{#time:i}}60}}+{{#time:s}}}}/86.4}} )}}}}
}}
I always found it ironic that the table syntax is designed to resemble ascii-art type tables, and then literally nobody writes it in a way that looks like an ascii art table.
Yeah, it’s not terribly hard to do that even without AI (Prettier can do it, for example). But there’s a lot of places where the tooling just isn’t available. Then again, it’s probably not a big deal if your GitHub comment markup isn’t perfect.
I think the root of the problem is, almost everything else you use in Markdown is easy to do by hand. There’s just no good syntax for tables like this, I guess.
I agree it's great, but that risk is so major that I stopped using it. "There's a 50% chance that your editor will invisibly corrupt the data you enter, and another 30% chance to corrupt the entire file" is just not usable...
Especially in Zed where the only way to switch hard tabs is buried in the settings menu, and impossible to change per buffer.
Lack of control over your editor's behaviour shouldn't be acceptable on this level. Just like making tabs/spaces visible, control like this ahould be a basic feature of every editor.
You'd think more editors would be smart enough to recognize that it's a TSV file and therefore should preserve the tabs, in much the same way that you'd think editors would be smart enough to recognize that something's a Makefile and therefore should preserve the tabs.
It gets tricky when you have a TSV inside Markdown. I don’t think I’ve ever seen tabs used for indentation in Markdown in the wild, though it probably does work.
We could, however, make the Tab key insert spaces if the cursor is in the beginning of a line, and a literal \t if it’s in the middle. This way, you can write a TSV table pretty much anywhere you want.
I'm not very familiar with this model series but tried it when prototyping a potential 'Vibesbench' (my eval) conversation I tried it. Here's Kimi on a Robocop scene
kimi k2 thinking turbo
"He’s not being poetic; he’s weaponizing nostalgia. Quoting Casablanca—the ultimate “beautiful sacrifice” romance—to taunt Morton about corporate backstabbing is pure executive sleaze. It’s the same move as his mahogany desk and flag pin: draping sentimental Americana over naked power. He’s essentially saying, “Isn’t it romantic, Bob? You fought the good fight, now you die.” "
grok 4.1
"So when Dick Jones smirks and says, “Time goes by, it’s the same old story… the fight for love and glory, huh, Bob?” that’s pure Ed Neumeier flexing his love of old Hollywood while simultaneously mocking how these psychopathic ’80s execs think they’re in some grand cinematic drama instead of just murdering a guy for market share."
chatgpt-4o
"The reason the Casablanca reference matters is RoboCop itself is partially a noir—and also a tragedy built inside a genre engine. [… ] So yeah, calling back to Casablanca is not just some meta-joke—it’s a knowing wink that RoboCop is a noir trapped in cybernetic skin. // Even Clarence Boddicker, for all his cartoonish sadism, plays like a twisted stand-in for a noir heavy—talking smooth, dropping nicknames, throwing out casual violence like it’s a punchline."
opus 4.5
"Jones is performing avuncular menace. He's the kind of executive who'd quote old movies while having you murdered, because it's all performance to him, all style. The substance is just power."
It's puzzling to me that all this theorizing doesn't just look at the actual effects of AI. It's very non-intuitive
For example the fact that AI can code as well as Torvalds doesn't displace his economic value. On the contrary he pays for a subscription so he can vibe code!
The actual work AI has displaced is stuff like: freelance translation, graphic illustration, 'content writing' (writing seo optimized pages for Google) etc. That's instructive I suppose. Like if your income source can already be put on upwork then AI can displace it
So even in those cases there are ways to not be displaced. Like diplomatic translation work can be part of a career rather than just a task so the tool doesn't replace your 'job'.
He used it to generate a little visualiser script in python, a language he doesn't know and doesn't care to learn, for a hobby project. It didn't suddenly take over as lead kernel dev.
I think AI displacing graphics illustrators is a tragedy.
It's not that I love ad illustrations, but it's often a source of income for artists who want to be doing something more meaningful with their artwork. And even if I don't care for the ads themselves, for the artists it's also a form of training.
As someone who has to switch between three languages every day, fixing the text is one of my favourite usages of LLMs. I write some text in L2 or L3 as best as I can, and then prompt an LLM to fix the grammar but not change anything else. Often it will also explain if I'm getting the context right.
That being said, having it translate to a language one doesn't speak remains a gamble, you never know it's correct so I'm not sure if I'd dare use it professionally. Recently I was corrected by a marketing guy that is native in yet another language because I used a ChatGPT translation for an error message. Apparently it didn't sound right.
Re displacing freelance translation, yes - it can displace the 95% of cases where 95% accuracy is enough. Like you mention though, for diplomatic translations, court proceedings, pacemaker manuals etc you're still going to need a human at least checking every line since the cost of any mistake is so high
Both Trump presidencies have really shown how little check there is on the White House when it comes to coordinating among these agencies. Heck literally one of the first the things he did in Jan 2016 is try to find out which park ranger posted a sparse inauguration photo. It wouldn't even occur to me that he was the de facto boss of millions of people in this way
Cause consider the previous status quo. It was considered somehow scandalous for Bill Clinton to have an opinion on what his AG Janet Reno was doing
Fischer plays 17... Be6, leaving his Queen hanging.
Standard narrative: "Fischer offers his Queen for a mating attack!"
Engine reality: 17... Be6 is the correct move. Trying to save the Queen actually loses the advantage.
Byrne taking the Queen (18. Bxb6) was a massive blunder. The engine actually wants Byrne to ignore the Queen and trade off Fischer's Knight on c3. He ends up with a Queen stranded on a3, a total spectator
Right. So I guess that's my quibble with the term sacrifice (shared by Rudolf Spielmann)
But what's interesting to me is the counterfactual like outside of these 3 queen moves he would have lost the entire advantage. So it was like a tactical shot like capturing the golden snitch in Harry Potter
Sure, I get what you're saying. It's still a sacrifice, but the compensation is just mate in 2, so the there's no real "sacrifice" here.
That being said, any sacrifice that doesn't guarantee a better (or at least equal) position isn't a sacrifice either, it's just "hope chess", aka a bad move. In Blitz or Bullet you can make the case for a "bad" sacrifice for positional complexity and putting time pressure on your opponent to make accurate defensive moves.
In the Opera game, Black just played a poor game start to finish. Giving up the bishop for the knight, pushing the B pawn while the king wasn't castled.
I had two “brilliant” moves in one chess.com game today. One was a bishop sacrifice that would have led to mate in three. The other was a queenside castle that the engine wanted me to do sooner. I suck at chess, although I did see the bishop sacrifice as the right move. The engine rated me at 1500 for the game.
This is a common theme, gambits are such depending on what your level and calculation depth is.
The queen's gambit opening (almost inarguably a gambit as it is part of a well accepted name of a second move), really isn't a gambit in the sense that you can always recover the pawn, however it is a gambit in the sense that you temporarily give it up.
If we were particularly short sighted, no doubt, responding to an early white bishop threat on g5 or b5 with a knight on f6 or c6 would look like a gambit, as we are sacrificing the knight, but lo and behold, we regain the minor piece afterwards with xf6 or xc6!
The distinction would be whether the gambit or sacrifice is solid or refutable. But it is in both cases a sacrifice.
When I see Dario say all this I wonder what he's seeing that seems so differentiated from my POV
I wonder how much of Anthropic's success is based on a horde of prompt to code examples that they post-train on. We shouldn't assume Opus' web dev skills are the same as general intelligence
Here's something Sonnet 4 did (Jul 2025) that I don't think a human intern would do when I made a wikisource tool (transcript follows)
#-#-#
F:
let’s see how the wikisource tool wrks..
let’s try to match “Mary hath chosen that good part” in one of the bible sources they should have
Claude:
Great idea! Let me first search for some biblical content on WikiSource, then use the content match tool.
wiki_source_search
Perfect! Let me try the Douay-Rheims version of Luke, which should have the Mary and Martha story:
wiki_source_page_content_match
No matches! Let me try a slightly different phrasing – maybe it’s “hath chosen the good part”:
wiki_source_page_content_match
Still nothing. Let me try searching for just “Martha” to see if the story is there:
F:
dummy .. why would use kjv wording in a different edition lol
Claude:
Haha you’re absolutely right! I was being completely silly – of course the KJV wording “Mary hath chosen that good part” wouldn’t be in the Douay-Rheims translation. Different translations use different language entirely.
What I Did Wrong:
• Used King James Version phrasing in a Catholic translation
• The Douay-Rheims says “Mary hath chosen the best part”
• I was looking for “good part” (KJV wording)
Should Have Started With:
Just searching for “Martha” or “Mary” to see what the actual text says in that translation, then used the exact wording I found.
#-#-#
It's almost comedic.. "Perfect! Let me try the Douay-Rheims version of Luke" is where if you were watching a horror movie you'd say "Don't go down to the basement fridge!"
And I think a human when they first see the text match fail would go "hmm--ohh" but Claude starts doing permutations. This I guess is the side effect of Reinforcement Learning and system prompts that amount to saying: "Just do it. Don't ask questions. Just do it."
I find one-off anecdotal examples like this to be a bit like discourse around global warming - "Look at that ridiculous polar vortex we had this week! Global warming can't possibly be a thing!" Of course, a trend line comprises many points, and not every point falls perfectly in the center of the line! I'm not necessarily saying you are right or wrong, but your argument should address the line (and ideally give some reason why it might falter) rather than just a single point on that line.
Ah but I'm not arguing about the rate of change in the trend. I'm saying the signals are decoupled. That is to say an LLM can be as good as a programmer as Linus Torvalds without having even basic knowledge-generalization abilities we assume the median human with no specialized skills would have (when given the same knowledge an LLM has)
I think most LLM proponents would say that "basic knowledge-generalization abilities" is on a different, slower trend line.
I mean, you aren't very surprised that your CPU can crush humans at chess but can barely run an image classifier, right? But you probably wouldn't say (as you are saying with LLMs) that ability for a CPU to play chess is "decoupled" from classifying images. Increases in CPU speed improve both. You'd just say that one is a lot harder than the other.
I was asking it to see if the wikisource tools are working by looking up a Bible quote. There was no ambiguity about the task itself; what I'm saying is that Claude 'knows' a bunch of things (the Bible has different translations) that it doesn't operationalize when doing a task--issues that would would be glaringly obvious to a human who knows the same things
Maybe I'm missing the point as well, but what did it do wrong?
It seemed like you wanted to see if a search tool was working.
It looked to see. It tried one search using on data source KJ and found no matches. Next question would be is the quote not in there, is there a mis-remembering of the quote or is their something wrong with the data source. It tries an easier to match quote and finds nothing, which it finds odd. So next step in debugging is assume a hypotheses of KJ Bible datasource is broken, corrupted or incomplete (or not working for some other reason). So it searches for an easier quote using a different datasource.
It's unclear the next bit because it looks like you may have interrupted it, but it seems like it found the passage about Mary in the DR data source. So using elimination, it now knows the tool works (it can find things), the DR data source works (it can also find things), so back to the last question of eliminating hypotheses: is the quote wrong foe the KJ datasource, or is that datasource broken.
The next (and maybe last query I would do, and what it chose) was search for something guaranteed to be there in KJ version: the phrase 'Mary'. Then scan through the results to find the quote you want, then re-query using the exact quote you know is there. You get 3 options.
If it can't find "Mary" at all in KJ dataset then datasource is likely broken. If it finds mary, but results don't contain the phrase, then the datasource is incomplete. If it contains the phrase then search for it, if it doesn't find it then you've narrowed down the issue "phase based search seems to fail". If it does find and, and it's the exact quote it searched for originally then you know search has an intermittent bug.
This seemed like perfect debugging to me - am I missing something here?
And it even summarized at the end how it could've debugged this process faster. Don't waste a few queries up front trying to pin down the exact quote. Search for "Mary" get a quote that is in there, then search for that quote.
This seems perfectly on target. It's possible I'm missing something though. What were you looking for it to do?
What I was expecting is that it would pull up the KJV using the results returned from the wiki_source_search tool instead of going for a totally different translation and then doing a text match for a KJV quote
I'm beginning to think that origin stories are an underrated way to find these angles. Like why exactly did you start thinking about this topic. I guess the recipe bloggers were on to this with their long rambles about where they first tried this dish (albeit it may have been for SEO too...)
> Anthropic’s Super Bowl debut, which drew close attention from rivals, illustrated the risks of that convergence. By centering its message on promising not to include ads on Claude, which OpenAI announced it would begin testing earlier this year. The company made a bet that its ad free principles would resonate with audiences more than product familiarity.
> That wager came amid a significant awareness gap. According to an S&P Global 2025 survey of 1,149 respondents, 73% reported using ChatGPT, followed by Google’s Gemini at 41%. Tools like Perplexity and Claude registered in the single digits, with only 7% of respondents saying they use Claude.
> “The vast majority of consumers don’t know what Claude is. Many people aren’t going to really get the whole ‘Oh, we’re comparing ourselves to chatGPT,’” said Williamson.