Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

>We can speculate about just how far this scaling can go or how far is even necessary but all i've said there is true. We have models trained and evaluated on all those sizes.

The part about "far outperforming", which is the main claim, is wrong though. We saw models much smaller being developed that fare quite well, and are even competitive, with the larger ones.

You already said "only by training on far more data", which is different than "more parameters" being the only option.



>You already said "only by training on far more data", which is different than "more parameters" being the only option.

I never said more parameters was the only way to increase performance. I said the training data required to reach any arbitrary performance x reduces with parameter size.

It's literally right there in what I wrote.

>a 50 billion parameter model will far outperform a 5 billion one TRAINED ON THE SAME DATA.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: