Hacker News

kgeist · 2026-04-04T10:27:23 1775298443

I think the headline is misleading. It's some random fork of llama.cpp, I can't find evidence that TurboQuant was actually added to llama.cpp proper.

The only legit PR I can find is this [0] and it's still open.

There's currently a lot of rejected vibe-coded PRs: [1] (violation of AI policy).

The OP's PR says it was generated with Claude Code so it has a very low chance of getting merged upstream.

[0] https://github.com/ggml-org/llama.cpp/pull/21089

[1] https://github.com/ggml-org/llama.cpp/pulls?q=Turboquant+is%...

lastdong · 2026-04-04T14:50:25 1775314225

Indeed, thanks for pointing this out and the links. With the excitement I misread that it was an MR from the fork to the main project. I don’t think I’m able to fix the title though.

I find it quite exciting to read some results in an effort to understand if TurboQuant main ideas can be applied to model weights. There are other similar projects, so we’ll see, but it seems some of this fork results look promising.

pogue · 2026-04-04T11:56:21 1775303781

I see mentions showing it reduced the size of the models but not how much memory was saved. I guess it depends on how it's used? But I would be very curious to see some benchmarking for that.

jsilence · 2026-04-04T09:43:57 1775295837

Great news! Expecting this to get implemented in all the major inference runners pretty fast. See also: https://news.ycombinator.com/item?id=47637422

lastdong · 2026-04-04T08:52:29 1775292749

Cuda support added. Also see https://news.ycombinator.com/item?id=47562135#47635952