Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It's a tradeoff between getting "good enough" performance w/ guided/constrained generation and using 2x calls to do the same task. Sometimes it works, sometimes it's better to have a separate model. One good case of 2 calls is the "code merging" thing, where you "chat" with a model giving it a source file + some instruction, and if it replies with something like ... //unchanged code here ... some new code ... //the rest stays the same, then you can use a code merging model to apply the changes. But that's become somewhat obsolete by the new "agentic" capabilities where models learn how to diff files directly.


Haiku is my favorite model for the second pass. It's small cheap and usually gets it right. If I see hallucinations they are mostly from the base model in the first pass.


Depending on the task you can often get it in about one request on average. Ask for the output in Markdown with reasoning up front and the structured output in a code block at the end, then extract and parse that bit in code.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: