1B is actually huge for a TTS model. Here's an 82m model with probably the most stable/coherent output of all the open weights tts models I've tested: https://huggingface.co/spaces/hexgrad/Kokoro-TTS
But if you mean zero-shot cloning, yeah they all seem to have those slurred speech artefacts from time to time.
1B is actually huge for a TTS model. Here's an 82m model with probably the most stable/coherent output of all the open weights tts models I've tested: https://huggingface.co/spaces/hexgrad/Kokoro-TTS
But if you mean zero-shot cloning, yeah they all seem to have those slurred speech artefacts from time to time.