Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Are they doubling down on local LLMs then?

Neural Accelerators (aka NAX) accelerates matmults with tile sizes >= 32. From a very high level perspective, LLM inference has two phases: (chunked) prefill and decode. The former is matmults (GEMM) and the latter is matrix vector mults (GEMV). Neural Accelerators make the former (prefill) faster and have no impact on the latter.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: