>We can speculate about just how far this scaling can go or how far is even nece...

famouswaffles · on May 16, 2023

>You already said "only by training on far more data", which is different than "more parameters" being the only option.

I never said more parameters was the only way to increase performance. I said the training data required to reach any arbitrary performance x reduces with parameter size.

It's literally right there in what I wrote.

>a 50 billion parameter model will far outperform a 5 billion one TRAINED ON THE SAME DATA.