Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Why would you exclusively support schema inference, rather than also allowing users to manually specify their schemas?

Schema inference is very difficult to do correctly and safely, especially with small initial samples of instances (source: work on https://github.com/snowplow/schema-guru).



There's a little bit of terminology overloading going on here.

In Noms every value has a type. It's an immutable system, so this type just is. The type of `42` is `Number`. The type of `"foobar"` is `String`. The type of `[42,44]` is `List<Number>`. And if you add "foo" to that list, the type becomes `List<Number|String>`.

We don't try to infer a general database schema from a few instances of data. We just apply this aggregation up the tree and report the result.

That all said, we do want to eventually add schema _validation_, by which I mean the ability to associate a type with a dataset and have the database enforce that any value committed to the dataset is compatible with that type (following subtyping rules).


It sounds like Noms is dynamically typed, rather like SQLite — types are associated with values, not (just) with datasets. The difference is that SQLite (like Python or JS) only types leaf/atomic data, while you're also typing aggregate data. Is that right?

Are you planning on writing complete reference documentation at some point, like https://www.sqlite.org/limits.html, https://www.sqlite.org/howtocorrupt.html, https://www.sqlite.org/lang.html, https://docs.python.org/2/reference/index.html, and https://golang.org/ref/spec? Or is using Noms going to be more of a UTSL kind of thing? The documentation I've found so far seems to be purely tutorial and introductory in nature.

(I'm really glad you're writing Noms, by the way. There's an enormous need for it.)


Right - the challenge is that with dynamic typing and without schema validation, it's incredibly easy to break any strongly typed client/consuming application. You think you are dealing with a `List<Number>`, you have Go/Java/Haskell/whatever apps which are consuming that in a strongly typed fashion using their idiomatic record types, and then suddenly a user accidentally sends in a single value which turns the tree of values into a `List<Number|String>`, and all your consuming apps break.

Given that schema validation ("does this instance match this type?") is simpler to implement than schema inference ("what is the type of this instance?"), it's surprising to me to deliver inference first...


i don't understand how we could have gone in the opposite direction.

Schema validation for us is just looking at the type requirements of the dataset and the type of the value and seeing if they are compatible. How can we do that without first knowing the type of the value?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: