Can we toss in the work unsloth does too as an unsung hero? They provide excelle...

disiplus · 2026-02-20T15:20:28 1771600828

Yeah, they're the good guys. I suspect the open source work is mostly advertisements for them to sell consulting and services to enterprises. Otherwise, the work they do doesn't make sense to offer for free.

danielhanchen · 2026-02-20T23:03:26 1771628606

Haha for now our primary goal is to expand the market for local AI and educate people on how to do RL, fine-tuning and running quants :)

WanderPanda · 2026-02-21T03:47:18 1771645638

Amazing work and people should really appreciate that the opportunity costs of your work are immense (given the hype).

On another note: I'm a bit paranoid about quantization. I know people are not good at discerning model quality at these levels of "intelligence" anymore, I don't think a vibe check really catches the nuances. How hard would it be to systematically evaluate the different quantizations? E.g. on the Aider benchmark that you used in the past?

I was recently trying Qwen 3 Coder Next and there are benchmark numbers in your article but they seem to be for the official checkpoint, not the quantized ones. But it is not even really clear (and chatbots confuse them for benchmarks of the quantized versions btw.)

I think systematic/automated benchmarks would really bring the whole effort to the next level. Basically something like the bar chart from the Dynamic Quantization 2.0 article but always updated with all kinds of recent models.

danielhanchen · 2026-02-21T12:08:06 1771675686

Thanks! Yes we actually did think about that - it can get quite expensive sadly - perplexity benchmarks over short context lengths with small datasets are doable, but it's not an accurate measure sadly. We're actually investigating currently what would be the best efficient course of action on evaluating quants - will keep you posted!

jychang · 2026-02-21T11:29:48 1771673388

> How hard would it be to systematically evaluate the different quantizations? E.g. on the Aider benchmark that you used in the past?

Very hard. $$$

The benchmarks are not cheap to run. It'll cost a lot to run them for each quant of each model.

danielhanchen · 2026-02-21T12:08:54 1771675734

Yes sadly very expensive :( Maybe a select few quants could happen - we're still figuring out what is the most economical and most efficient way to benchmark!

illusive4080 · 2026-02-21T12:55:42 1771678542

Roughly how much does it cost to run one of the popular benchmarks? Are we talking $1,000, $10,000, or $100k?

danielhanchen · 2026-02-22T09:58:44 1771754324

Oh it's more time that's the issue - each benchmark takes 1-3 hours ish to run on 8 GPUs, so running on all quants per model release can be quite painful.

Assume AWS spot say $20/hr B200 for 8 GPUs, then $20 ish per quant, so assuming benchmark is on BF16, 8bit, 6, 5, 4, 3, 2 bits then 7 ish tests so $140 per model ish to $420 ish/hr. Time wise 7 hours to 1 day ish.

We could run them after a model release which might work as well.

This is also on 1 benchmark.

Zetaphor · 2026-02-21T05:11:11 1771650671

This would be amazing

danielhanchen · 2026-02-21T12:09:01 1771675741

Working on it! :)

arcanemachiner · 2026-02-20T17:58:44 1771610324

I hope that is exactly what is happening. It benefits them, and it benefits us.

swyx · 2026-02-21T02:11:12 1771639872

not that unsung! we've given them our biggest workshop spot every single year we've been able to and will do until they are tired of us https://www.youtube.com/@aiDotEngineer/search?query=unsloth

danielhanchen · 2026-02-21T12:09:26 1771675766

Appreciate it immensely haha :) Never tired - always excited and pumped for this year!

danielhanchen · 2026-02-20T23:01:32 1771628492

Oh thank you - appreciate it :)

cubie · 2026-02-20T14:56:39 1771599399

I'm a big fan of their work as well, good shout.

danielhanchen · 2026-02-20T23:01:41 1771628501

Thank you!