Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

About how many training steps are required to get good output?


Depends on the model size, batch size, input sequence length, ... etc. With a small model like this you'll never get a 'good' output but you can maximise its potential.


I trained 12,000 steps at 4 layers, and the output is kind of name-like, but it didn't reproduce any actual name from it's training data after 20 or so generations.


not many. diminishing returns start before 1000 and past that you should just add a second/third layer




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: