More

Patrick_Devine · 2026-06-03T20:30:15 1780518615

I realize this is a little confusing; we're working w/ the MLX team to bring MLX to other platforms, but we're not quite there yet. The `gemma4:12b-nvfp4` model is specifically for the MLX engine.

For the GGUF 4bit variant (i.e. non-macs) you'll need `gemma4:12b-it-q4_K_M` which I just pushed. You'll also need to upgrade to version 0.30.4 which we're just about to release (it's in prerelease and we're running through our last regression tests).

embedding-shape · 2026-06-03T22:12:40 1780524760

I gotta say, having both "gemma4:12b-mlx-bf16" and "gemma4:12b-nvfp4" be MLX-specific, and not labeling all of the MLX-specific ones as such, is a bit different than "little confusing" and more "set up to be confusing" :)

> You'll also need to upgrade to version 0.30.4 which we're just about to release

Interesting, wasn't Google coordinating today's release with you? Considering the blog post seems to have gone out way before the release even been cut.

Patrick_Devine · 2026-06-03T22:22:57 1780525377

Given the model was just republished by Google 15 minutes ago and we're going to have to redo everything (and everyone will have to redownload for all platforms -- not just Ollama), I'll just say that sometimes things don't work out exactly the way you want them to. :-D

That said, I think the gemma4:12b-nvfp4 model is pretty solid. It's been tuned with Nvidia's model optimizer. I've been waiting on the results for MMLU-Pro, but I'll have to retrigger that after reconverting.

embedding-shape · 2026-06-03T23:40:48 1780530048

> Given the model was just republished by Google 15 minutes ago

Hah, missed that! Guess that's slightly neat though, you get a second chance ;) NVFP4 been a blast to use across a wide range of models, seems to work really well, at least with vLLM and a nvidia card.

spicySpy · 2026-06-04T11:05:51 1780571151

Would you mind to share the link to `gemma4:12b-it-q4_K_M`?

Patrick_Devine · 2026-06-03T20:19:33 1780517973

I haven't yet pushed the MTP enabled gemma4 12b model for Ollama because in my testing I wasn't getting a performance bump. The other gemma4 MTP models should work OK right now, but there are some fixes we're just about to push. This is specifically for the MLX backend.

dofm · 2026-06-03T20:31:41 1780518701

Thanks for your reply. I will go back and look at Ollama again.

So much to learn but this news has really vindicated my decision to direct my limited span of concentration and focus to learning how to use open weights models and opencode.

Patrick_Devine · 2026-05-05T18:17:46 1778005066

In my testing the Gemma 4 31b model had the biggest speed boost in Ollama w/ the MLX runner for coding tasks (at about 2x). Unfortunately you'll need a pretty beefy Mac to run it because quantization really hurts the acceptance rate. The three other smaller models didn't perform as well because the validation time of the draft model ate up most of the performance gains. I'm still trying to tune things to see if I can get better performance.

You can try it out with Ollama 0.23.1 by running `ollama run gemma4:31b-coding-mtp-bf16`.

Patrick_Devine · 2026-04-24T18:54:25 1777056865

I wish they would do this when you're boarding the plane. I get that there is essential information that everyone needs to know, but if you're a frequent flier you've probably heard the "put your larger carry-on in the overhead bin and your smaller bag underneath the seat in front of you" hundreds, if not thousands of times.

AlotOfReading · 2026-04-24T19:12:47 1777057967

There's a large subpopulation of people flying who seem to have no idea how planes and airports work. Maybe they're sleep deprived or it's their first time flying, but these announcements are targeted at them.

s0rce · 2026-04-24T19:26:15 1777058775

I think its more likely that the people do know they just don't care and it helps them to put their backpack overhead so they do it anyways. There is minimal/no enforcement.

floren · 2026-04-24T20:12:33 1777061553

I'm very much a we-live-in-a-society, follow the rules kind of guy, but if I checked a bag and only have my backpack in the cabin, you bet your ass I'm going to try and find a place for it in the overhead instead of cluttering up where I want to put my feet. The flight attendants can go scold the passenger with the oversized roller + backpack + 20 liter "purse" instead.

s0rce · 2026-04-24T22:59:04 1777071544

Yes, the logical rule would be 1 bag in the overhead per person. If they enforced carry-on sizes strictly and charged less for checked luggage the problem would probably go away.

bsder · 2026-04-25T06:21:35 1777098095

It has nothing to do with price. I don't check luggage on domestic flights because of the enormous time lag for the airport to give me back my luggage. (There's also "United Breaks Guitars", but that's an independent problem)

If I could walk from the plane to the luggage area and my luggage was already there 90% of the time, I probably would check more things.

However, the US airports simply don't employ enough people to move the luggage around fast enough.

The is 100% correctable by employing more people. But some CEO needs another yacht, so they don't. So, I simply don't check luggage.

Gibbon1 · 2026-04-24T20:52:14 1777063934

I remember one time I had to fly back from a business trip on the Wednesday before Thanksgiving. Made me realize there is something about business travelers, they cut towards situationally aware and self conscientious types. The opposite of people flying the day before Thanksgiving.

I flew into the Orange County Airport before they tore it down and made it like the others. Felt very civilized. As I get older I find the hostile public spaces and infrastructure more and more annoying.

et-al · 2026-04-24T20:12:35 1777061555

Unfortunately there's also a large subpopulation of people flying who wear noise-cancelling headphones and have their eyes glued to their phones; choosing to be disengaged from their immediate surroundings.

advisedwang · 2026-04-24T19:07:11 1777057631

Especially flying with kids at naptime or bedtime. Trying to get an extremely tired toddler to fall asleep on a plane just to hear an announcement about in flight entertainment. OMG.

insane_dreamer · 2026-04-24T20:15:53 1777061753

Much much worse are the repeated advertisement “announcements” about signing up for their credit card or frequent flyer program

tencentshill · 2026-04-24T19:58:12 1777060692

There is a large and growing population of people leaving their home country for the first time ever, let alone by plane.

traderj0e · 2026-04-24T23:02:08 1777071728

That particular rule kinda depends on the airline and how full the flight is

Rygian · 2026-04-24T19:51:24 1777060284

There is apparently 10000 people every day who learn about it for the first time, according to https://xkcd.com/1053/

Patrick_Devine · 2026-04-17T19:59:58 1776455998

Isn't this why NASA is developing the Electrodynamic Dust Shield [1] system?

[1] https://www.nasa.gov/image-article/nasas-dust-shield-success...

Patrick_Devine · 2026-04-16T17:47:25 1776361645

If you're on a Mac, use the MLX backend versions which are considerably faster than the GGML based versions (including llama.cpp) and you don't need to fiddle with the context size. The models are `qwen3.6:35b-a3b-nvfp4`, `qwen3.6:35b-a3b-mxfp8`, and `qwen3.6:35b-a3b-mlx-bf16`.

egorfine · 2026-04-17T11:11:38 1776424298

I was comparing various models at M5 Pro 48GB RAM MLX vs GGUF and found that MLX models have a higher time to first token (sometimes by an order of magnitude) while tokens/sec and memory usage is same as GGUF.

Gemma 3 27B q4:

* MLX: 16.7 t/s, 1220ms ttft

* GGUF: 16.4 t/s, 760ms ttft

Gemma 4 31B q8:

* MLX: 8.3 t/s, 25000ms ttft

* GGUF: 8.4 t/s, 1140ms ttft

Gemma 4 A4B q8:

* MLX: 52 t/s, 1790ms ttft

* GGUF: 51 t/s, 380ms ttft

All comparisons done in LM Studio, all versions of everything are the latest.

Patrick_Devine · 2026-03-31T22:41:35 1774996895

They are nvidia-fp4 weights, but CUDA support isn't _quite_ ready yet, but we've got that cooking.

Patrick_Devine · 2026-03-31T22:39:22 1774996762

The 35b-a3b-coding-nvfp4 model has the recommended hyperparameters set for coding, not chatting. If you want to use it to chat you can pull the `35b-a3b-nvfp4` model (it doesn't need to re-download the weights again so it will pull quickly) which has the presence penalty turned on which will stop it from thinking so much. You can also try `/set nothink` in the CLI which will turn off thinking entirely.

Patrick_Devine · 2026-03-31T22:31:41 1774996301

Try it with mxfp8 or bf16. It's a decent model for doing tool calling, but I wouldn't recommend using it with 4 bit quantization.

Patrick_Devine · 2026-02-24T23:29:31 1771975771

I noticed the same thing. I'm assuming they forgot to photoshop out the chinese characters.