I am not sure about that! It seems like small models are emerging that are a bit more specific but can be very small, and thus have much lower latency. For example: https://arxiv.org/abs/2305.07759
This might be worked around with a characteristic verbal affectation that makes their utterances initially long-winded. Once the LLM's lag is passed, the proper content may begin to flow. Thinking fast and slow.