Hacker Newsnew | past | comments | ask | show | jobs | submit | more hellovai's commentslogin

oh thats really interesting, how often do you get errors like that?

fyi, we actually fix those specific errors in our parser :)


The main drawback is really when you attempt to do more advanced prompting techniques like chain-of-thought or reasoning.

forcing those parts to be json, can be hard and unnecessarily constrain the model. e.g. https://www.promptfiddle.com/Chain-of-Thought-KcSBh

try pressing run tests and you'll see what i mean! this method or doing chain of thought works a bit better


;) https://www.promptfiddle.com/structured-summary-66myE (sorry bad syntax highlighting when including baml code in baml code)

{ author: "Sam Lijin"

key_points: [ "Structured output from LLMs, like JSON, is a common challenge."

  "Existing solutions like response_format: 'json' and function calling often disappoint."

  "The article compares multiple frameworks designed to handle structured output."

  "Handling and preventing malformed JSON is a critical concern."

  "Two main techniques for this: parsing malformed JSON or constraining LLM token generation."

  "Framework comparison includes details on language support, JSON handling, prompt building, control, model providers, API flavors, type definitions, and test frameworks."

  "BAML is noted for its robust handling of malformed JSON using a new Rust-based parser."

  "Instructor supports multiple LLM providers but has limitations on prompt control."

  "Guidance, Outlines, and others apply LLM token constraints but have limitations with models like OpenAI's."
]

take_way: "Consider using frameworks that efficiently handle malformed JSON and offer prompt control to get the desired structured output from LLMs."

}


that's a great question, there's three main benefits:

1. seeing the full prompt, even though that python code feels leaner, somehow you need to convert it to a prompt. a library will do that in some way, BAML has a VSCode playground to see the entire prompt + tokenization. If we had to do this off of python/ts, we would run into the halting problem and making the playground would be much much harder.

2. there's a lot of codegen we do for users, to make life easier, e.g. w/o BAML, to now do streaming for the resume, you would have to do something like this:

class PartialResume: name: Optional[str] education: List[PartialEducation] skills: List[str]

and then at some point you need to reparse PartialResume -> Resume, we can codegen all of that for you, and give you autocomplete, type-safety for free.

3. We added a lot of static analysis / jump to definition etc to JINJA (which we use for strings), and that is much easier to navigate than f-strings.

4. Since its code-gen we can support all languages way easier, so prompting techniques in python work the same exact way for the same code in typescript.


the main one is that most people don't own the model. so if you use openai / anthropic / etc then you can't use token masking. in that case, reprompting is pretty much the only option


In the specific cases of openai and anthropic, both have 'tool use' interfaces which will generate valid JSON following a schema of your choice.

You're right, though, that reprompting works with pretty much everything out there, including hosted models that don't have tool use as part of their API. And its simple too, you don't even need to know what "token masking" is.

Reprompting can also apply arbitrarily criteria that are more complex than just a json schema. You ask it to choose an excerpt of a document and the string it returns isn't an excerpt? Just reprompt.


It does. With OpenAI at least you definetly can use token masking. There are some limitations but even those are circumventable. I have used token masking on the OpenAI API with LMQL without any issues.


thats pretty cool! We'll update the page after taking a look at the library!


our paid product is still in Beta actually as we're continuing to build it out, but BAML itself is and always will be open source (runs fully locally as well - no extra network calls).

in terms of parsing, I do think we're likely the best approach as of now. Most other libraries do reprompting or rely on constraining grammars which require owning the model. Reprompting = slow + $$, constraining grammars = require owning the model. we just tried a new approach: parse the output in a more clever way.


not a noob question, here's how the LLM works:

```

prompt = "..."

output = []

do:

  token_probabilities = call_model(prompt)

  best_token = pick_best(token_probabilities)

  if best_token == '<END>':

    break

  output += best_token
while true

return output

```

basically to support generation they would need to modify pick_best to support constraining. That would make it so they can't optimize the hot loop at their scales. They support super broad output constraints like JSON which apply to everyone, but that leads to other issues (things like chain-of-thought/reasoning perform way worse in structured responses).


> things like chain-of-thought/reasoning perform way worse in structured responses

That is fairly well establish to be not true.


XML is also a great option, but there are a few trade offs:

> XML is a many more tokens (much slower + $$$ for complex schemas)

> regardless of if you're looking for } or </output> its really a matter of "does your parser work". when you have three tokens that need to be correct "</" "output", ">", the odds of a mistake are higher, instead of when you just need "}".

That said, the parser is much easier to write, we're actually considering supporting XML in BAML. have you found any reductions of accuracy?

Also, not sure if you saw this, but apparently Claude doesn't actually prefer XML, it just happens to work well with it. Was recently new info for myself as well. https://x.com/alexalbert__/status/1778550859598807178 (devrel @ Anthropic)


I think once you have < and / the rest becomes much easier to predict. In a way it “spreads” the prediction over several tokens.

The < indicates that the preceding information is in fact over. The “/“ represents that we are closing something and not starting a subtopic. And the “output” defines what we are closing. The final “>” ensures that our “output” string is ended. In JSON all of that semantic meaning get put into the one token }.


Hmm, that's an interesting way of thinking about it. The way I see it, I trust XML less, because the sparser representation gives it more room to make a mistake: if you think of every token as an opportunity to be correct or wrong, the higher token count needed to represent content in XML gives the model a higher chance to get the output wrong (kinda like the birthday paradox).

(Plus, more output tokens is more expensive!)

e.g.

using the cl_100k tokenizer (what GPT4 uses), this JSON is 60 tokens:

    {
      "method": "GET",
      "endpoint": "/api/model/details",
      "headers": {
        "Authorization": "Bearer YOUR_ACCESS_TOKEN",
        "Content-Type": "application/json"
      },
      "queryParams": {
        "model_id": "12345"
      }
    }
whereas this XML is 76 tokens:

    <?xml version="1.0" encoding="UTF-8" ?>
    <method>GET</method>
    <endpoint>/api/model/details</endpoint>
    <headers>
        <Authorization>Bearer YOUR_ACCESS_TOKEN</Authorization>
        <Content-Type>application/json</Content-Type>
    </headers>
    <queryParams>
        <model_id>12345</model_id>
    </queryParams>
You can check out the tokenization here by toggling "show tokens": https://www.promptfiddle.com/json-vs-xml-token-count-BtXe3


you will love yaml since its a similar improvement in token use over json


Hey everyone! One of the creators of BAML here! Appreciate sharing this post. For anyone interested in playing around with an interactive version of BAML online, check it out here: https://www.promptfiddle.com


Really interesting library! In the docs, could you describe in a bit more detail which kind of JSON errors it tolerates? And which models currently work best with your parsing approach?


Thanks! We should add that to the docs haha. But the here's a few:

- keys without strings

- coercing singular types -> arrays when the response requires an array

- removing any prefix or suffix tags

- picking the best of many JSON candidates in a string

- unescaped newlines + quotes so "afds"asdf" converts to "afds\"asdf"

In terms of models, honestly, we tried as bad as llama2, and it seems to work in quite a few use cases


Thanks! I see myself using the library soon :-)


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: