Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This is somehow missing the Gemma and Gemini series of models from Google. I also think that not mentioning the T5 series of models is strange from a historical perspective because they sort of pioneered many of the concepts in transfer learning and kinda kicked off quite a bit of interest in this space.


The Gemma models are too small to be included in this list.

You're right the T5 stuff is very important historically but they're below 11B and I don't have much to say about them. Definitely a very interesting and important set of models though.


> too small

Eh?

* Gemma 1 (2024): 2B, 7B

* Gemma 2 (2024): 2B, 9B, 27B

* Gemma 3 (2025): 1B, 4B, 12B, 27B

This is the same range as some Llama models which you do mention.

> important historically

Aren't you trying to give a historical perspective? What's the point of this?


Since you included GPT-2, everything from Google including T5 would qualify for the list I would think.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: