I mean, I think we're both in the realm of [citation needed] here. I would argue...

mark-r · on April 14, 2020

The difference is that with UTF-8 you're much more likely to trip over those bugs in random testing. With UTF-16 you're likely to pass all your test cases if you didn't think to include a non-BMP character somewhere. Then someone feeds you an emoji character and you blow up.

camgunz · on April 14, 2020

Which is why you should be using a library for all this, that uses fuzzing and other robustness checks.