Indeed, thanks for pointing this out and the links. With the excitement I misread that it was an MR from the fork to the main project.
I don’t think I’m able to fix the title though.
I find it quite exciting to read some results in an effort to understand if TurboQuant main ideas can be applied to model weights. There are other similar projects, so we’ll see, but it seems some of this fork results look promising.
I see mentions showing it reduced the size of the models but not how much memory was saved. I guess it depends on how it's used? But I would be very curious to see some benchmarking for that.
The only legit PR I can find is this [0] and it's still open.
There's currently a lot of rejected vibe-coded PRs: [1] (violation of AI policy).
The OP's PR says it was generated with Claude Code so it has a very low chance of getting merged upstream.
[0] https://github.com/ggml-org/llama.cpp/pull/21089
[1] https://github.com/ggml-org/llama.cpp/pulls?q=Turboquant+is%...