Haven't tried it, but applications like these are the best uses of LLMs. Translation from an informal human language to a formal language that the human can review, and can be manipulated formally further.
I believe the attempts to have LLMs "write the code" or "reason" are wrong. What we need is to move to more formal languages to replace the ambiguity of natural language.
I agree with this for sure. One of the most effective things I built myself with an LLM was a natural language search for complex transaction data. I wrote the available filters and the code to apply them to the data, but instead of a complex UI to set the filters up you just write natural language and an LLM converts it to my filter data structures for you. This feels like it's playing into their strength too, it's really good at understanding the data structure I want it to return and sticking to it.
Totally agree. I've recently built this: https://github.com/ludovicianul/sol which lets you use natural language to query your git history for different metrics.
one thing i personally admire/appreciate about simonw's work is how beautifully his tools, services and research "stacks" to unlock higher and higher levels of abstraction. so many of the little one-off scraper/notifier/transformer repos on his github showcase the amazing composability of the tooling he built previously.
DuckDB is a more convenient and performant tool for analyzing large CSV/JSON files. Shameless plug: if you’re interested in combining DuckDB with Claude using the recently announced Model Context Protocol (MCP), check out the MyDuck Server project (https://github.com/apecloud/myduckserver) that I’m working on. You can follow this guide https://github.com/apecloud/myduckserver/blob/main/docs/tuto... to get started. The guide includes a conversation history showcasing how the free-tier Claude Haiku model successfully analyzed the Our World in Data Energy dataset.
MyDuck Server adds a Postgres frontend for DuckDB, allowing seamless interaction with DuckDB via the official Postgres MCP server.
That's the amazing thing about the apparently still enthusiastic acceptance of LLMs for tasks that are better done by experts. We've moved on before, but now sometimes it seems almost perfectly okay to peddle uncertainty, hallucinations and random results. And no, it's not about democratizing computer use, it's about choosing the right resource for the right job if the outcome is going to be about something. Everything else is a degeneration into vagueness or half-knowledge, not to mention the problems of missing or abandoned data privacy and (software) freedom.
I’d definitely prefer to have a professional certified DBA help me query my hobby project databases. But do you know one that’ll do it for (almost) free and is available 24/7?
Even at work, I can’t hog a single data engineer’s attention for hours on end without at least trying myself first.
If you have created a hobby project database, you have the skills to learn how to query it - it is probably part of your hobby and fun. If not, and if you do not want to open source it to get community support, you are better off cutting out the database part and choosing other technologies.
At work, the situation should not be so different. If your manager cannot provide you with the means to maintain the database according to the business needs, you / your boss / your team / your business / your company have a problem and should better choose technologies that are manageable and/or for which you have enough resources.
You seem to have some pretty detailed understanding of my hobby projects, and the way I enjoy doing them, considering you know neither them nor me!
> If you have created a hobby project database, you have the skills to learn how to query it - it is probably part of your hobby and fun.
Not at all in many cases. Many existing open source projects these days involve a database for better or worse, and I wouldn't enjoy porting their storage layer to something other than whatever database they already use.
I really don't need more engineering rabbit holes to go down in my life, neither at work nor in my free time. I do so by defining loose boundaries labeled "here be time sinks" and only go there if I'm really curious or it seems enjoyable, but not when I'm trying to get something else done.
LLMs have moved these boundaries somewhat, and I believe for the better.
> If the SQL query fails to execute (due to a syntax error of some kind) it
> passes that error back to the model for corrections and retries up to three
> times before giving up.
Funnily enough syntax errors is the one thing that you can completely eliminate in LLMs simply by masking the output symbol probability vector to just those which are valid upcoming symbols.
Yeah, that's a pretty solid approach if the LLM you're using (and the third party that hosts it, if you're not self-hosting) supports that.
One minor footgun I've seen with that approach is that while the model is guaranteed to produce syntactically valid outputs, it can still "trap" the model into outputting something both wrong and low-probability if you design your schema badly in specific ways (contrived example: if you're doing sentiment analysis of reviews and you have the model pick from the enumeration ["very negative", "negative", "slightly negative", "neutral", "positive"], then the model might encounter a glowing review and write "very", intending to follow it up with " positive", but since " positive" isn't a valid continuation it ends up writing "very negative" instead).
What a tired argument. Is this what the field is reduced to? Computers are supposed to be fast, deterministic, and correct. It is profoundly disappointing that we’re regressing to a system that is as mediocre as the average bullshitter and laud it as an achievement.
The thing with humans is that you can build trust. I know exactly who to ask if I have a question about music, or medicine, or a myriad of other topics. I know those people will know the answers and be able to assess their level of confidence in them. If they don’t know, they can figure it out. If they are mistaken, they’ll come back and correct themselves without me having to do anything.
Comparing LLMs to random humans is the wrong methodology. Of course, Upton Sinclair had a point so I don’t expect to convince someone who is monetarily invested in having this broken assumption succeed.
Is not the answer here the same? Build the trust in the AI systems. Benchmarks help and are a start. Consistent results over time are another measure. We'll learn over the next few years which models, or which companies developing model can be trusted for certain sets of tasks. If these exceed alternatives, and have consistency over a margin I can reason about, I'll make use of them, same as any other tool.
I can ask the exact same question of an LLM multiple times and get different answers with the same degree of confidence. Hard to trust that, and also hard to fix.
Which wouldn’t be so problematic if people didn’t just turn off their brains when interacting with them.
Either way, everything you’re suggesting are possibilities for the future, which may or may not pan out. The bad comparisons to humans are happening today.
There are plenty of use cases where this is a great starting place to be able to jump in and start to ask questions.
I have worked on a variety of acquisitions where I didn’t need to have 100% certainty, I just needed a starting place to make sense of some bizarre home grown financials and something like this would’ve been great to be able to quickly probe and come back with thoughtful questions.
Instead I spent my time either tediously figuring out a schema or waiting for someone in finance to come back with ad-hoc analysis that also had a bunch of errors.
> Why are you pretending this is a binary outcome?
You seem confused. My argument is not at all concerned with outcomes and is not binary in the slightest. My point is clearly spelled out:
> Comparing LLMs to random humans is the wrong methodology.
I’m not saying LLMs are never useful or anything of the sort. What I am saying is that defending LLMs on the basis of “some random undefined human would do the same” is a poor argument which shows a deep misunderstanding of human collaboration.
In other words, just like when I run an SQL query myself or ask a team member to do it.
The correct usage here is of course not to just run a natural language query and blindly trust the results. Sanity checking both the query and results is essential.
I still find LLMs incredibly useful in exposing new SQL functionality to me or refactoring larger existing queries to a very different approach (as SQL is unfortunately does not allow defining query components in a modular way that would let me avoid that).
> refactoring larger existing queries to a very different approach
Huh, that's really interesting. I've found LLMs (mostly Claude) to be pretty bad at writing SQL (they love cross joins for some reason), so it's interesting that others are getting good results. What models are you using, do you do any particular prompt engineering or anything different?
I usually start out by pasting a simple (correct, executing) stub query in, prefacing it with "this query does <general thing I'm trying to do>", and then go step by step: "Filter for x", "now add a join to this other table <definition>" etc., and build the larger query iteratively.
Another pattern I've found pretty useful: "This isn't working. Build a small sample dataset to exercise your query, and I'll paste the results back to you so we can both see what might be going wrong."
Basically, I treat it as an intern, not as an oracle of truth.
Huh that's helpful, thanks! To be fair I mostly only need it for new dialects but my successful attempts with other tools seem to follow the same pattern.
I think a big difference is that people do things in predictably stupid ways and an LLM? Who knows. That could be because I've worked with stupid people my entire life and I know the boundaries of their chaos.
If you read at least 3 paragraphs in, you'll see that this tool attempts to generate a query using an LLM. Dismissing a tool before attempting to understand it is astounding.
If a structured language for querying databases already exists, why wouldn't one use that to query the database?
A tool that returns SQL from human language sounds great, but not one that runs the query unchecked. I speak four (human) languages, and sometimes I'll mistranslate something. Not to mention that this tool won't even bubble up the first two syntax errors that it generates - not very confidence inspiring.
I'm very good at SQL. I still use LLMs to write queries for me on a daily basis because they are faster at it than I am, and I can very quickly review their work to check that they didn't make any mistakes.
Same applies for JavaScript and Python and Bash and AppleScript and Go and jq and dozens of other programming languages.
If you're going to use LLMs as a productivity boost you need to get good at quickly verifying that they've done the right thing. Effectively that means investing more in QA skills and habits, which is generally valuable anyway.
Because it’s honestly quite bad and clunky by modern standards (I specifically miss composability of sub-query components, which makes every query like a throwaway effort), many people don’t enjoy learning it, and at least personally the likelihood of getting a given query right without doing lots of double checking is pretty low for advanced queries.
I do find it much easier to read/validate than to write though, which makes it an excellent application for LLM usage, in my experience.
I once told my son when he was 4 that if his tablet crashes, or a video stops working, he needs to ask it nicely. It usually worked - the issue was lack of patience. Still, it made me smile when I saw him every now and then putting his most polite voice and saying, "please, can you play my video?".
But soon... Won't be able to do anything without interacting with an LLM!
It can use any of the LLMs supported by my LLM tool and its plugins - it defaults to OpenAI but you can configure it to use something running locally like Ollama instead.
It always strikes me as odd how much of using LLMs is doing the dumb plumbing of transferring/connecting data in and out of the LLM and whatever domain things are involved. The LLM is doing the interesting stuff, and we get to do data entry.
So now there are tools like this to help automate this simple work, but I wonder why the LLM can’t already handle it… “create a tool to automate querying a SQLite database using an LLM”
> The LLM is doing the interesting stuff, and we get to do data entry.
I'd say it's just the opposite:
I'm telling the LLM what outcome I want (and, currently, how I want it to go about doing that, since it often can't figure it out itself), and it handles the low level details of getting the SQL syntax right, tapering over minor details in available utility functions and syntactic sugar across databases.
From the article: "It adds LLM as a dependency, and takes advantage of LLM’s Python API to abstract over the details of talking to the models. This means sqlite-utils-ask can use any of the models supported by LLM or its plugins"
I do wonder if the command line will get a new generation of users with more tools like this that make systems more accessible.
Our bet with a similar tool (getdot.ai) is that the shift goes into existing communication tools (Slack, etc.), but CLI is an interface I personally enjoy.
I believe the attempts to have LLMs "write the code" or "reason" are wrong. What we need is to move to more formal languages to replace the ambiguity of natural language.