I'm still waiting for a model that generates images as part of its thought process. That's what I was hoping this was!
I wonder what it would take to train a proof of concept. Maybe start with videos of people whiteboarding and explaining math or solving engineering problems visually. Then, create captions that frame these as word problems. Train the model so that, during inference, it generates images—not necessarily for human viewing, but as part of its internal reasoning.
I'm still waiting for a model that generates images as part of its thought process. That's what I was hoping this was!
I wonder what it would take to train a proof of concept. Maybe start with videos of people whiteboarding and explaining math or solving engineering problems visually. Then, create captions that frame these as word problems. Train the model so that, during inference, it generates images—not necessarily for human viewing, but as part of its internal reasoning.