It's a tradeoff between getting "good enough" performance w/ guided/constrained ...

BoredPositron · 2025-09-23T12:59:17 1758632357

Haiku is my favorite model for the second pass. It's small cheap and usually gets it right. If I see hallucinations they are mostly from the base model in the first pass.

hedgehog · 2025-09-23T15:53:49 1758642829

Depending on the task you can often get it in about one request on average. Ask for the output in Markdown with reasoning up front and the structured output in a code block at the end, then extract and parse that bit in code.