reply
Wild how bad it is compared to, say, Russet for iOS/ipadOS, which runs these same models at 110 tps.
I'd recommend going for any quantized 1B parameter model. So you can look at llama 3.2 1B, gemma3 1B, qwen3 VL 2B (if you'd like vision)
Appreciate the kind words!
That's using the word "real" very loosely.