But Chinchilla optimality, while an interesting result, is a strange target for ...

whimsicalism · on April 19, 2023

Yep, but if stability has the goal of training the best possible model then that would explain the choices they made.

GaggiX · on April 19, 2023

I mean 800B tokens on a 3B model and 7B model is still way beyond the Chinchilla scale.

MacsHeadroom · on April 19, 2023

They're going to 1.5T and possibly 3T. The 800B is just for the "Alpha" checkpoints released today. New checkpoints will be released later.