Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The most useful models are image, video, and audio models. It makes sense that we'd make the video models more 4D aware.

Text really hogged all the attention. Media is where AI is really going to shine.

Some of the most profitable models right now are in music, image, and video generation. A lot of people are having a blast doing things they could legitimately never do before, and real working professionals are able to use the tools to get 1000x more done - perhaps providing a path to independence from bigger studios, and certainly more autonomy for those not born into nepotism.

As long as companies don't over-raise like OpenAI, there should be a smooth gradient from next gen media tools to revolutionary future stuff like immersive VR worlds that you can bend like the Matrix or Holodeck.

And I'll just be exceedingly chuffed if we get open source and highly capable world models from the Chinese that keep us within spitting distance of the unicorns.



>> The most useful models are image, video, and audio models

This is wrong. The vast majority of revenue is being generated by text models because they are so useful.


> they are so useful.

Enterprise doesn't know how to use these models to achieve business outcomes.

These subscriptions will unwind, and when they do, it'll be a bloodbath.


I work in an enterprise using LLMs all over the place, well. Our spending is only going to go one way, up.


>Some of the most profitable models right now are in music, image, and video generation.

I don’t think many of the companies running these make a profit right now


> Some of the most profitable models right now are in music, image, and video generation.

Which companies are using these.models to run at a profit?


MidJourney, ElevenLabs, Suno, Kling


> MidJourney, ElevenLabs, Suno, Kling

Maybe I need to re-read reports; last I checked, none of those companies were operating at a profit.


That just sounds like text with extra steps.

Fundamentally what AGI is trying to do is to encode ability to logic and reason. Tokens, images, video and audio are all just information of different entropy density that is the output of that logic reasoning process or emulation of logic reasoning process.


> Fundamentally what AGI is trying to do is to encode ability to logic and reason.

No? The Wason selection task has shown that logic and reason are not really core nor essential to human cognition.

It's really verging on speculation, but see chapter 2 of Jaynes 1976 - in particular the section on spatialization and the features of consciousness.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: