Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Thanks for reading the post and github README. Supporting training is definitely feasible but the benefit may not be as significant as low-latency inference since training generally involves much larger kernels, making kernel launch overhead less significant.

Thanks for sharing the FlashDMoE work. Our next step is to support MoE models. Stay tuned!



Thanks for the inputs. It's very helpful to know.

I look forward to following mirage development.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: