> But the fact that we need special libraries to answer fairly basic queries abo...

> But the fact that we need special libraries to answer fairly basic queries about unicode text doesn't bode well.

That's always been needed to actually properly work with unicode, what do you think ICU is? Few if any languages have complete native Unicode support. And it's hardly new, Unicode has an annex (#29) dedicated to text segmentation: http://www.unicode.org/reports/tr29/