I thought this would be about text-to-speech applications, while this seems more like an encoder-decoder problem (make the network learn a pattern and then let it reproduce it). I'm wondering how long it is until we see working TTS based on LSTM RNNs.