> Those rumors seemed ridiculous in hindsight No, those rumors seemed ridiculous...

> Those rumors seemed ridiculous in hindsight

No, those rumors seemed ridiculous even then. Many AI influencers were posting some of the most absurd material, often makes basic mistakes (like confusing training tokens with parameters), but anyone in the field could have easily told you that 100T parameters sounded ridiculous.

On that note, "100 Terabytes of GPU memory is exactly what you need to train that class of model." is also likely false. That's how much you'd need to fit such a model into memory at 1 byte per param. Not train it.