Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Most API providers (Together, Fireworks etc) don't build their own models.


You don't need a new model. The trick of the technique is that you only change how tokens are sampled; Zero out the probability of every token that would be illegal under the grammar or other constraints.

All you need for that is an inference API that gives you the full output vector, which is trivial for any model you run on your own hardware.


Though Fireworks is one of the few providers that supports structured generation.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: