It is particularly funny because this is content marketing for a computational proof of work "captcha". Those are pure snakeoil, with economics that are probably at least four orders of magnitude more favorable to the abusers than this attestation would be.
This doesn't seem to be controlling for the number of turns in any way. Am I missing something?
Stronger models needing fewer turns to achieve a task feels like a prime source of efficiency gains for agentic coding, more so than individual responses being shorter.
They also don't mention what their sample size is, or anything about the distribution of input and response lengths.
It'd be interesting to see the distributions if the author actually plotted the data, so we could see if their analysis holds water or not.
A plot of the input lengths using ggplot2 geom_density with color and fill by model, 0.1 alpha, and an appropriate bandwidth adjustment would allow us to see if the input data distribution looks similar across the two, and using the same for the output length distributions, faceted by the input length bins would give us an idea if those look the same too.
Edit: Or even a faceted plot using input bins of output length/input length.
OpenRouter may see you fire hundreds of requests at them, but they have no idea that "these 50 requests here at 4PM are for task A", "those 100 requests there does task B", etc. So it's a shallow analysis at the "overall request shape" level.
The current bottleneck is silicon. Every chip that is manufactured gets housed and powered. (It makes sense: the cost of compute is dominated by capex, the power costs are irrelevant, so they're ok paying a premium for power).
The space data center hypothesis relies on compute supply growing faster than power supply. (Both are bottlenecked on parts of the supply chain that will take ages to scale.)
Even if you believe that's the case, the point at which orbital data centers start making sense is incredibly sensitive to the exact growth rates.
The current bottleneck is not silicon. There is plenty of silicon locked up in previous gen GPUs that are no longer efficient enough to run relative to newer models. The bottleneck is the economics of owning the older GPU models - which is why all the GPU neoclouds are gonna go bust unless they can get customers to continue renting old GPUs.
The economics are vastly different when opex is near zero for these things
H100 rental prices are still as high as when the cards were brand new. The prices vastly exceed the power costs.
In a world where power or DC permits are the current bottleneck those H100s would be getting retired in favor of Blackwells. But they aren't. They are instead being locked in for years long contracts.
If silicon were relatively abundant and power/DC space scarce, you'd get an order of magnitude more bang for the Watt by replacing the H100s with newer GPUs.
But nobody is doing that. Blackwells are being installed as additional capacity, not Hopper replacements.
So it is pretty clear that silicon is the primary bottleneck.
Because you'd need to trash the old GPUs in order to make room for new GPUs. Right now new GPUs get online mostly in new DCs. TSMC fab capacity is much more limiting than DC building and it will likely keep being the case. It's much easier to build a DC than a fab.
In fairness your calculation looks at the most expensive element of the DC but ignores all of the associated parts required to utilize the H100: CPU, memory, cooling, etc. No to say that that flips the calculation (I don't have the answer), but it does leave a lot of power out.
Let's be generous and pretend the rest of the hardware is free but double the energy budget of the H100 to account for all of it along with cooling. You're still at only $1k/yr; $10k over 10 years, or 25% of the TCO (ignoring all other costs).
reply