It doesn't seem clear, but can the model do correct emphesis? On things like single words:
I did not steal that horse
Is the trivial example of something where intonation of the single word is what matters. More importantly if you are reading something, as a human, you change the intonation, audiolevel, and speed.
I did not steal that horse
Is the trivial example of something where intonation of the single word is what matters. More importantly if you are reading something, as a human, you change the intonation, audiolevel, and speed.