Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

wchar_t was advanced for its time, Microsoft was an early adopter of Unicode and the ANSI codepage system it replaced was real hell but what almost everyone else was using. UTF-8's dominance is much more recent than Linux users tend to assume - Linux didn't (and in many places still doesn't) support Unicode at all, but an API that passes through ASCII or locale-based ANSI can have its docs changed to say UTF-8 without really being wrong. Outside of the kernel interface, languages used UTF-16 for their string types, like Python and Java. Even for a UTF-8 protocol like HTTP, UTF-16 was assumed better for JS. Only now that it is obvious that UTF-16 is worse (as opposed to just having an air of "legacy"), is Microsoft transitioning to UTF-8 APIs.


> an API that passes through ASCII or locale-based ANSI can have its docs changed to say UTF-8 without really being wrong

Actually, it can be wrong, and it is not necessarily a good idea to do this anyways (actually, is almost always is not a good idea to do this (just changing the documentation), I think, unless the problem was an error in the original documentation). Sometimes it is better to say that it is ASCII but allows 8-bit characters as well (without caring what they are), or something like that. For font rendering, it will be necessary to be more specific although it might depend on the font as well.

> it is obvious that UTF-16 is worse

UTF-16 is not always worse. It depends both on the program (and what requirements it has for processing the text) and on the language of the text. And then, there is also UTF-32. (And sometimes, Unicode is worse regardless of the encoding.)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: