More

prng2021 · 2026-02-28T06:02:21 1772258541

How is anyone predicting timelines for AGI when these systems can’t do basic addition of 2 arbitrary numbers with 100% accuracy?

famouswaffles · 2026-02-28T06:36:24 1772260584

Can you do basic addition of 2 arbitrary numbers with 100% accuracy (no tools) ? No you can't. You will make mistakes for a sufficiently large N even with pen and paper, and a very small N without. Are you no longer generally intelligent ?

sambapa · 2026-02-28T09:15:09 1772270109

No, but I can develop methods to eventually do it.

undersuit · 2026-02-28T17:46:53 1772300813

Somewhere along the line a $10000 GPU has to be equivalent to using a finger to do arithmetic in the dust.

wmf · 2026-02-28T06:16:46 1772259406

LLMs should use tool calling (which is 100% reliable) instead of doing math internally. But in general it would be nice to be able to teach a process and have the AI execute it deterministically. In some sense, reliability between 99% and 100% is the worst because you still can't trust the output but the verification feels like wasted effort. Maybe code gen and execution will get us there.

base76 · 2026-02-28T13:12:59 1772284379

This is the exact problem CognOS was built to solve.

  99% reliable means you still can't remove the human from the loop — because you never know which 1% you're in. The only way to actually trust output is to attach a verifiable confidence   
  signal to each response, not just hope the aggregate accuracy holds.                                                                                                                        
                                                                                                                                                                                            
  We built a local gateway that wraps every LLM output with a trust envelope: decision trace, risk score, and an explicit PASS/REFINE/ESCALATE/BLOCK classification. The point isn't to make 
  LLMs more accurate — it's to make their uncertainty legible so the human knows when to step in.

  Open source if you want to look at the architecture: github.com/base76-research-lab/operational-cognos

base76 · 2026-02-28T12:30:41 1772281841

"reliability between 99% and 100% is the worst because you still can't trust the output"

prng2021 · 2026-02-28T05:53:20 1772258000

“I think the decision was made because the people making this decision at Anthropic are well-intentioned, driven by values, and motivated by trying to make the transition to powerful AI to go well.”

Every single one of these CEOs happily pirated unimaginable amounts of copyrighted content. That directly hurt millions of real human beings. Not just the prior creators but also crushing the future potential for success of future ones.

https://www.susmangodfrey.com/wins/susman-godfrey-secures-1-...

prng2021 · 2026-02-10T13:37:46 1770730666

“You never needed 1000s of engineers to build software anyway”

What is the point of even mentioning this? We live in reality. In reality, there are countless companies with thousands of engineers making each piece of software. Outside of reality, yes you can talk about a million hypothetical situations. Cherry picking rare examples like Winamp does nothing but provide an example of an exception, which yes, also exists in the real world.

caminante · 2026-02-10T15:47:12 1770738432

IMHO, it's flamebait. Your quoted text is provably false -- e.g., MSFT Windows, AWS, etc. and appeals to idealism of lean project teams.

prng2021 · 2026-02-09T13:50:10 1770645010

This was a great article. The section “Training for the next state prediction” explains a solution using subagents. If I’m understanding it correctly, we could test if that solution is directionally correct today, right? I ask a LLM a question. It comes up with a few potential responses but sends those first to other agents in a prompt with the minimum required context. Those subagents can even do this recursively a few times. Eventually the original agent collects and analyzes subagents responses and responds to me.

hrn_frs · 2026-02-09T15:04:37 1770649477

Any attempt at world modeling using today's LLMs needs to have a goal function for the LLM to optimize. The LLM needs to build, evaluate and update it's model of the world. Personally, the main obstacle I found is in updating the model: Data can be large and I think that LLMs aren't good at finding correlations.

ethbr1 · 2026-02-09T19:32:30 1770665550

Isn't that just RL with extra power-intensive steps? (An entire model chugging away in the goal function)

hrn_frs · 2026-02-10T04:20:52 1770697252

That's correct, but if successful you'd essentially have updated the LLM's knowledge and capabilities "on the fly".

ethbr1 · 2026-02-10T11:14:34 1770722074

Maybe we could run off-peak load of that nature, when power is cheaper. Call it dreaming. ;)

markab21 · 2026-02-09T15:03:33 1770649413

And I think you basically just described the OpenAI approach to building models and serving them.

prng2021 · 2026-02-09T02:22:01 1770603721

Hard to tell how widespread it is but it’s not just 1 company. https://www.webpronews.com/silicon-valleys-2025-ai-boom-revi...

prng2021 · 2026-02-07T02:08:50 1770430130

Thanks for repeating what the author explained.

prng2021 · 2026-02-06T13:54:50 1770386090

Where did they say that?

zozbot234 · 2026-02-06T13:58:27 1770386307

"The model advances both the frontier coding performance of GPT‑5.2-Codex and the reasoning and professional knowledge capabilities of GPT‑5.2, together in one model, which is also 25% faster. ... With GPT‑5.3-Codex, Codex goes from an agent that can write and review code to an agent that can do nearly anything developers and professionals can do on a computer."

They're specifically saying that they're planning for an overall improvement over the general-purpose GPT 5.2.

prng2021 · 2026-02-05T18:48:48 1770317328

Did they post the knowledge cutoff date somewhere

jacekm · 2026-02-05T23:16:51 1770333411

It's here: https://platform.claude.com/docs/en/about-claude/models/over...

Reliable knowledge cutoff: May 2025, training data cutoff: August 2025

brikym · 2026-02-06T03:35:17 1770348917

This is the thread for GPT 5.3

prng2021 · 2026-02-04T13:24:22 1770211462

The author gives this example of the problem and incorrect way to leverage AI:

"Sarah was relieved. She thought she could focus on high-value synthesis work. She’d take the agent’s output and refine it, add strategic insights, make it client-ready."

Then they propose a long winded solution which is essentially the same exact thing but uses the magical term "orchestrate" a few times to make it sound different.

TaupeRanger · 2026-02-04T13:42:52 1770212572

Well, the article was written by AI, so I wouldn't expect it to make valid arguments through a long article like this.

karmakurtisaani · 2026-02-04T20:32:59 1770237179

Already the headline is classic AI shitty writing. This isn't just x, it's basically the same thing y.

dccoolgai · 2026-02-04T13:31:13 1770211873

In fairness to the author, I think their point was that you take _several_ agents (not just one) and find a way to have them work like a team of 20 people. In the example, Sarah is trying to do the same job she did before, just marginally better.

prng2021 · 2026-02-04T13:47:41 1770212861

Yea I guess that's accurate but they also explained that AI capabilities advance every 6-12 months and managing a team of agents buys you a few years. So their proposed solution and conclusion that it keeps you safe for years makes no sense right now. Multi agent orchestration, with an agent doing the orchestrating, is all the craze nowadays.

dccoolgai · 2026-02-04T14:10:40 1770214240

They made half the point, in my opinion - that you should be "doing the thing that wasn't possible before" but missed the other half - that maybe the thing you should be doing is owning and creating relationships with customers yourself instead of doing it through a company... Which maybe wasn't possible before but is now.

FrustratedMonky · 2026-02-04T13:50:53 1770213053

I agree. But the article then seems to suggest, 'you be the one left standing to orchestrate'. It didn't offer much of a suggestion about the other 20 people that would be gone.

It seemed to come down to the old 'just work better , faster, cheaper' , but that is dialed up to 11 now.

dccoolgai · 2026-02-04T14:16:25 1770214585

I read it more as "look for the thing that was _never done_ because no one was going to hire 20 people to do it" and all the examples were pointing out how you _should not_ try to "better, faster, cheaper" AI because you will lose quickly on all those dimensions.

I realize the irony, of course, that this article is AI-generated but it provoked something close to an epiphany for me even so.

veggieroll · 2026-02-04T13:36:44 1770212204

> add strategic insights

This claim has always been BS in my experience.

prng2021 · 2026-01-21T23:40:03 1769038803

“existence is a requirement to have morality. That implies that the highest good are those decisions that improve the long-term survival odds of a) humanity, and b) the biosphere.”

Those are too pie in the sky statements to be of any use in answering most real world moral questions.