Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It's interesting that they pitch this for agent development. The realtime API provides a much simpler architecture for developing agents. Why would you want to string together STT -> LLM -> TTS when you could have a consolidated model doing all three steps? They alluded to there being some quality/intelligence benefits to the multi-step approach, but in the long-run I'd expect them to improve the realtime API to make this unnecessary.


Text allows developers lots for flexibility to do other processing, including RAG, calling APIs yourself and multiple chained LLM invocations. The low latency of realtime API means relying fully on one invocation of their model to do everything.


The realtime API can be used to call tools [0], but I agree with your general point on the flexibility of working directly with text.

[0] https://github.com/openai/openai-realtime-agents




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: