Hacker Newsnew | past | comments | ask | show | jobs | submit | jsnell's commentslogin

It is particularly funny because this is content marketing for a computational proof of work "captcha". Those are pure snakeoil, with economics that are probably at least four orders of magnitude more favorable to the abusers than this attestation would be.

You can check https://everyuuid.com/ for collisions.

This doesn't seem to be controlling for the number of turns in any way. Am I missing something?

Stronger models needing fewer turns to achieve a task feels like a prime source of efficiency gains for agentic coding, more so than individual responses being shorter.


They also don't mention what their sample size is, or anything about the distribution of input and response lengths.

It'd be interesting to see the distributions if the author actually plotted the data, so we could see if their analysis holds water or not.

A plot of the input lengths using ggplot2 geom_density with color and fill by model, 0.1 alpha, and an appropriate bandwidth adjustment would allow us to see if the input data distribution looks similar across the two, and using the same for the output length distributions, faceted by the input length bins would give us an idea if those look the same too.

Edit: Or even a faceted plot using input bins of output length/input length.


OpenRouter may see you fire hundreds of requests at them, but they have no idea that "these 50 requests here at 4PM are for task A", "those 100 requests there does task B", etc. So it's a shallow analysis at the "overall request shape" level.

I think it should be tested on goals.

E.g. Crack this puzzle, fix this code so these tests pass. (A human can verify it doesn't cheese things).


The current bottleneck is silicon. Every chip that is manufactured gets housed and powered. (It makes sense: the cost of compute is dominated by capex, the power costs are irrelevant, so they're ok paying a premium for power).

The space data center hypothesis relies on compute supply growing faster than power supply. (Both are bottlenecked on parts of the supply chain that will take ages to scale.)

Even if you believe that's the case, the point at which orbital data centers start making sense is incredibly sensitive to the exact growth rates.


The current bottleneck is not silicon. There is plenty of silicon locked up in previous gen GPUs that are no longer efficient enough to run relative to newer models. The bottleneck is the economics of owning the older GPU models - which is why all the GPU neoclouds are gonna go bust unless they can get customers to continue renting old GPUs.

The economics are vastly different when opex is near zero for these things


All of that is incorrect.

H100 rental prices are still as high as when the cards were brand new. The prices vastly exceed the power costs.

In a world where power or DC permits are the current bottleneck those H100s would be getting retired in favor of Blackwells. But they aren't. They are instead being locked in for years long contracts.


Why exactly would the H100s get retired for Blackwells if specifically power and DC permits were the bottleneck?

Because they are >10x more power efficient.

If silicon were relatively abundant and power/DC space scarce, you'd get an order of magnitude more bang for the Watt by replacing the H100s with newer GPUs.

But nobody is doing that. Blackwells are being installed as additional capacity, not Hopper replacements.

So it is pretty clear that silicon is the primary bottleneck.


Because you'd need to trash the old GPUs in order to make room for new GPUs. Right now new GPUs get online mostly in new DCs. TSMC fab capacity is much more limiting than DC building and it will likely keep being the case. It's much easier to build a DC than a fab.

Right, where is the rest of the code?

The Google ad network's revenue is 10% of their first party ad revenue. It would be even harder to make the numbers work that way.

If you try to track down the actual source for that Tom's Hardware link, it becomes pretty obvious that the claim is not credible. [0]

GPUs do not burn out in three years, H100 rentals are priced at the same level as two years ago, and are effectively sold out. [1]

[0] https://news.ycombinator.com/item?id=46203986#46208221

[1] https://newsletter.semianalysis.com/p/the-great-gpu-shortage...


Try updating your Claude Code client. I believe it is a bad interaction between Opus 4.7 and older system prompts.


I don't see how that number could possibly be realistic.

A H100 cost 30k when new, and uses 500W of power.

500W for a year is about 4500kWh, which at $0.10/kWh is $450/year if run at full utilization (unrealistic).

TCO of an AI data center should be entirely dominated by capex depreciation.


In fairness your calculation looks at the most expensive element of the DC but ignores all of the associated parts required to utilize the H100: CPU, memory, cooling, etc. No to say that that flips the calculation (I don't have the answer), but it does leave a lot of power out.


Let's be generous and pretend the rest of the hardware is free but double the energy budget of the H100 to account for all of it along with cooling. You're still at only $1k/yr; $10k over 10 years, or 25% of the TCO (ignoring all other costs).


> Pick any two cypherpunks at random and you won't find that kind of overlap on non-technical quirks.

That could be a valid methodology if you pre-registered the list of quirks before doing the investigation.

But in this case the journalist clearly didn't do that, but tweaked the set of quirks until they produced the desired outcome.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: