More

staticshock · 2026-04-13T04:59:23 1776056363

I do the same. Gmail gives me a single, standardized interface for opting out of emails: mark it as spam. All the various companies I've given my email to, on the other hand, give me different, either clunky or often outright broken interfaces for opting out. There's no direct financial incentive for them to invest in making ethical, robust opt-out systems.

However well meaning, collectively all those companies are still just a bunch of sociopaths. This might be a bit dark, but I think a reasonable real world analogy here is stalkers and restraining orders. A stalker isn't motivated to listen to you when you tell them to stop talking to you. That's why you get the restraining order.

staticshock · 2026-04-12T19:59:43 1776023983

They were taught not to read errors because they encountered thousands of errors (in other software) that were less helpful than that one.

Most people have an adversarial relationship with software: it is just the pile of broken glass they have to crawl through on the way to getting their task done. This understanding is reinforced and becomes more entrenched with each next paper cut.

hermitcrab · 2026-04-13T08:43:55 1776069835

I guess it is a mindset thing. Techies see something like this as a problem to solve. Non-techies often panic at the slightest variance from what they were expecting. See also: https://en.wikipedia.org/wiki/Learned_helplessness

staticshock · 2026-04-10T02:35:09 1775788509

You know the saying, "when you owe the bank a million dollars, that's your problem, but when you owe the bank a billion dollars, that's the bank's problem"?

I suspect the theory behind OpenAI is to grow to be "too big to fail" as fast as they can, because once they cross that threshold, their liquidity/solvency problems will cease to be theirs, and become everyone else's.

staticshock · 2026-04-02T17:33:01 1775151181

LLMs seem to me closer to Kahneman's System 1 than to System 2. When understood in this way, it is obvious why LLMs are bad at counting r's in "strawberries". But it also makes ZEH feel like it couldn't possibly be a useful metric, because it's a System 2 evaluation applied to a System 1 system.

derefr · 2026-04-02T23:31:01 1775172661

FYI, the LLM letter-counting problem has nothing to do with counting per se, and is instead entirely down to LLMs not getting to see your raw UTF-8 byte stream, but rather having a tokenizer intermediating between you and it, chunking your UTF-8 bytes into arbitrary, entirely-opaque-to-the-LLM token groupings.

Try it for yourself — under the most popular tokenizer vocabulary (https://tiktokenizer.vercel.app/?model=cl100k_base), "strawberry" becomes [str][aw][berry]. Or, from the model's perspective, [496, 675, 15717]. The model doesn't know anything about how those numbers correspond to letters than you do! It never gets sat down and told "[15717] <=> [b][e][r][r][y]", with single-byte tokens on the right. (In fact, these single-byte tokens appear in the training data extremely rarely, and so the model doesn't often learn to do anything with them.)

Note that LLMs can predictably count the number of r's in "s t r a w b e r r y", because <Count the number of r's in "s t r a w b e r r y"> becomes [Count][ the][ number][ of][ r]['s][ in][ "][s][ t][ r][ a][ w][ b][ e][ r][ r][ y]["]. And that's just a matching problem — [ r] tokens for [ r] tokens, no token-correspondence-mapping needed.

orbital-decay · 2026-04-03T00:19:36 1775175576

>entirely-opaque-to-the-LLM token groupings

This is clearly not the case, any modern (non-reasoning) model easily decomposes words into individual token-characters (try separating them with e.g. Braille spaces...) and does arbitrary tokenization variants if forced with a sampler. It's way deeper than tokenization, and models struggle exactly with counting items in a list, exact ordering, retrieving scattered data, etc. LLM context works a lot more like associative memory than a sequence that can be iterated over. There are also fundamental biases and specific model quirks that lead to this.

8note · 2026-04-02T18:27:56 1775154476

> When understood in this way, it is obvious why LLMs are bad at counting r's in "strawberries".

no it doesnt. it makes sense that they cant count the rs because they dont have access to the actual word, only tokens that might represent parts or the whole of the word

orbital-decay · 2026-04-02T18:59:12 1775156352

Tokenization is a simplistic explanation which is likely wrong, at least in part. They're perfectly fine reciting words character by character, using different tokenization strategies for the same word if forced to (e.g. replacing the starting space or breaking words up into basic character tokens), complex word formation in languages that heavily depend on it, etc. LLMs work with concepts rather than tokens.

im3w1l · 2026-04-02T18:30:06 1775154606

A big part of skill aquisition in humans is moving tasks from system 2 to system 1, to free up the very scarce thinking resources for ever more complex tasks, that can then in turn be internalized and handled by system 1.

staticshock · 2026-02-18T21:23:05 1771449785

here's another good one: https://terraformindustries.com/

staticshock · 2026-02-15T19:47:55 1771184875

> the death of Payton Isabella Leutner

she's alive, so "attempted murder" would be more appropriate.

i really enjoyed the "we are 3D printers for our thoughts" framing!

munificent · 2026-02-16T01:04:24 1771203864

Sorry, misremembered the detail on that one!

staticshock · 2026-02-11T18:14:53 1770833693

I think the way to see this as the organic process of discovering hard-to-game benchmarks. The loop is:

1. People discover things LLMs can kind of do, but very poorly.

2. Frontier labs sample these discoveries and incorporate them into benchmarks to monitor internally.

3. Next generation model improves on said benchmarks, and the improvements generalize to improvements on loosely correlated real world tasks.

staticshock · 2026-02-05T20:25:20 1770323120

For a social network, more information about their users = better ad targeting. It likely gets plumbed into models to inform user profiles.

Aurornis · 2026-02-05T20:49:08 1770324548

Look at the actual list. It's primarily questionable AI tools, scrapers, lead generation tools, and other plugins in that vein.

I would guess this is for rate limiting and abuse detection.

staticshock · 2026-02-05T18:37:21 1770316641

Nailed it. I think about prescriptivism / descriptivism in terms of these archetypes:

- "Rule followers" think an org will be better off if everyone agreed on a set of rules to follow. At the boundaries, they will think about establishing new rules to clarify and codify new things. Charitably, I'd add that they might remove rules that are obsolete, but we all know this is not sufficiently true in practice: governments, for example, are much more likely to add new rules than to remove old ones.

- "Rule breakers" think that most rules are suggestions. At the boundaries, they will see rules other people are needlessly bound by, and translate those into strategic openings for whatever game they're playing. For better and for worse, start-up ecosystems are full of people like this.

Rule followers want to be told what's allowed, while rule breakers try to figure out what _should_ be allowed from first principles. At the extreme, they tug the world towards authoritarianism or towards anarchy.

This is obviously a spectrum, so everyone has both of these archetypes in them, albeit in different proportions (e.g. most people pay taxes, but almost no one drives the speed limit).

staticshock · 2026-02-05T18:01:23 1770314483

The main problem here is that real people operate in fuzzy domains. Snapping them into place "with code" won't magically resolve the gray areas inherent to the most valuable real workflows.

Think about the prized "high agency worker." What makes them desirable is the willingness and ability to make well informed, unilateral decisions on matters that are likely not yet organizationally codified, or codified in a way that is "wrong" for the task at hand.

Also, the reason terraform works is because it is _operational_. As in, it's actual code that runs. If it was mere documentation, it would drift like nobody's business. In order to make "organizational code" operational, you would need enforcement (a compliance team?) manually keeping the documentation in sync with reality in all of the meat and thought spaces where real work happens.

The only place where this can plausibly be automated is in digital spaces. In fact, I'm surprised the article doesn't go there: "organizational code" starts feeling way more plausible as definition for AI agents than for real people, specifically because agents are operationalized in digital spaces, where enforcement can be automated.

NuclearPM · 2026-02-05T23:28:06 1770334086

The most productive workers follow the intent of procedures and use a risk based approach to following or not following the details.

Qerub · 2026-02-06T07:06:16 1770361576

Speaking about high agency workers, this Company as Code framework reminds me a lot of SaaSiroth introduced in https://youtu.be/dLTUqPue9sQ?si=OIxmP5_D-YZZD2UO&t=200 (KRAZAM: High Agency Individual Contributor).