Sharing this both because it is phenomenal work that will bring peace to 18 families and addresses some of the questions raised when I posted my Gamow Labs founding story last week (https://news.ycombinator.com/item?id=48471048).
While I haven't published my results yet (dang OpenAI beat me to it!), they are very similar and I have no doubt that this technology will significantly improve the detection and diagnosis of rare disease, which is a nice win for humanity.
I'm in the process of getting my first batch of long reads, but I am skeptical that this is the "just" what's needed. There is little doubt that long read > short read, but I think that computational techniques for both need to be improved significantly.
There is already some clinical evidence to support my hypothesis. The first clinical long read trial at Kansas City Mercy showed a 10% bump in diagnostic rate, which is great but not fully solving the problem: https://news.childrensmercy.org/unlocking-answers-faster-chi...
Thanks for sharing (I am OP). Lots of really interesting computational problems in genetics. The original speech recognition models, HMMs, diffused to genetics ahead of many other fields with which we associate machine learning. Look forward to watching this tech talk.
Opus-4.8/GPT-5.5 coupled with their respective harnesses will do a good job for more vanilla cases, I realized a custom harness and set of skills and MCP servers became essential as complexity increased (mostly hard SVs). I'll get an eval and more technical post out in a few weeks.
If you are open to chatting about your experience, I'd love to hear from you. I spend a lot of time learning from and supporting other rare disease families these days.
And the anti-natal thing was kind-of joking not joking. I do know lots of people with kids there now, but when my wife first got pregnant, we were alone.
Certainly. If there’s something I can do for you that will assist I’m happy to do so. Email on profile and I don’t mind chatting in person in SF should you so desire.
This was not intended to be a technical post (obvious I hope).
I'm planning on getting one out in the next few weeks characterizing the system and how it performed on real clinical use-cases vs. alternatives and existing tools.
The TL;DR is that Gamow Labs is a harness and interface company on top of SOTA LLMs as you suggested, but my harness and interface outperforms the existing thing. While this approach would have earned me the "wrapper company" label last year, I hope the success of OpenEvidence, Harvey, Perplexity, and so on has opened minds with respect to the value here.
It was only working through clinical cases that I realized how much more I needed beyond dropping raw reads into Codex.
Can you say a little more about how current genomic techniques rely on human interpretation? Is that mostly where you use an LLM (to act human-like), or is your approach different than that?
Current genomic techniques involve humans using a lot of different software (search, ranking models, visualizations, alignment algorithms, etc.) and synthesizing the results manually into a diagnosis.
Your assumption is correct about my technique. I cloned (and expanded) this workflow into an LLM harness, so the LLM is basically orchestrating a bunch of tools that normally humans would use (and writing the conclusions and doing all the standard LLM stuff).
While I am truly grateful for him and the team for their contributions to neonatal genetics (and hosting me in San Diego for a few days to show me how I could help), Rady was actually the unnamed lab that failed to diagnosis my son.
And this happens all the time. The WGS NICU diagnostic rate is only ~30%, depending on who you ask. Just because people have been working at this for a decade and products exists, doesn't mean it's a solved problem.
I don't know if you read until the end of my post, but I did run a small experiment in collaboration with an academic geneticist and outperformed the first-line clinical labs across the board. My approach, which is essentially Claude Code for genetics, is fundamentally different and novel than how this work is done today and seems to perform much better in early experiments. Time will tell is this generalizes to all clinical work.
I'm planning on publishing evals and benchmarks in the next few weeks, but out-of-the-box systems actually don't do very well for a variety of reasons.
Thanks for the reply. I have read your post but I haven't seen the preprint obviously and without knowing the details I remain skeptical.
> The WGS NICU diagnostic rate is only ~30%, depending on who you ask.
Agreed. It does not automatically mean, however, that it can be significantly improved with better variant interpetation or better analysis of the same wgs data in general sense.
> I'm planning on publishing evals and benchmarks in the next few weeks, but out-of-the-box systems actually don't do very well for a variety of reasons.
Happy to see it. I wish you all the luck and will be the first one praising your solution if I see convincing results.
> Agreed. It does not automatically mean, however, that it can be significantly improved with better variant interpetation or better analysis of the same wgs data in general sense.
I wouldn't say anything is automatic or taken for granted, but it is actually relatively common for more thorough reanalysis to uncover something that the first pass missed. I hinted at this in the post, but the reason that this doesn't happen today is human bandwidth.
A core part of my thesis is that that this highly specialized human bandwidth can be scaled with AI.
It may work. It may not work. But I would feel bad if I didn't give it a try.
> Happy to see it. I wish you all the luck and will be the first one praising your solution if I see convincing results.
The general point is that separating PM and eng doesn't make sense any longer. Which subsumes which is an interesting debate.
Your argument that 4.6 Opus makes the engineering skill set useless is totally false and maybe shows you haven't built anything complicated, but it is possible that Opus 5.2 will get there.
100%. PMs at startups already wear many hats and AI helps them do that even better.
But to this sister comment's point, I do think that the dedicated PM role will vanish and the classic BigCo PM will need to look a lot more like the startup one.
I think that all PMs will need to get onto the engineering, design, or research ladder. We are already seeing companies eliminate the function here and there and I expect the trend to continue.
This seems crazy to me. I am a PM and I am busier than ever. People are waking up to the idea that code is cheap and things can change faster now, so deciding _what_ to make and prioritise in the deluge of ideas coming to prod is becoming completely essential.
While I haven't published my results yet (dang OpenAI beat me to it!), they are very similar and I have no doubt that this technology will significantly improve the detection and diagnosis of rare disease, which is a nice win for humanity.
reply