Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

i desperately want a method to approximate this and unfortunately it's intractable in practice.

Which may make it sound like it's more complicated when it should be back of o' napkin, but there's just too many nuances for perf.

Really generally, at this point I expect 4B at 10 tkn/s on a smartphone with 8GB of RAM from 2 years ago. I'd expect you'd get somewhat similar, my guess would be 6 tkn/s at 4B (assuming rest of the HW is 2018 era and you'll relay on GPU inference and RAM)



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: