Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
[flagged]
lastdong 31 days ago | hide | past | favorite


I think the headline is misleading. It's some random fork of llama.cpp, I can't find evidence that TurboQuant was actually added to llama.cpp proper.

The only legit PR I can find is this [0] and it's still open.

There's currently a lot of rejected vibe-coded PRs: [1] (violation of AI policy).

The OP's PR says it was generated with Claude Code so it has a very low chance of getting merged upstream.

[0] https://github.com/ggml-org/llama.cpp/pull/21089

[1] https://github.com/ggml-org/llama.cpp/pulls?q=Turboquant+is%...


Indeed, thanks for pointing this out and the links. With the excitement I misread that it was an MR from the fork to the main project. I don’t think I’m able to fix the title though.

I find it quite exciting to read some results in an effort to understand if TurboQuant main ideas can be applied to model weights. There are other similar projects, so we’ll see, but it seems some of this fork results look promising.


I see mentions showing it reduced the size of the models but not how much memory was saved. I guess it depends on how it's used? But I would be very curious to see some benchmarking for that.


Great news! Expecting this to get implemented in all the major inference runners pretty fast. See also: https://news.ycombinator.com/item?id=47637422





Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: