As a *hole*, it would only be annoying and a performance penalty for validation....

cylemons · on June 2, 2023

Let say A is an ill formed utf-16 string with unmatched surrogates.

The problem comes when trying to convert A to utf-8. Is this the leak you are talking about?

chrismorgan · on June 2, 2023

That’s one of the two situations I speak of: when it happens in practice.

The other is… well, much the same really, but when it makes it into specs that others have to care about. The web platform demonstrates this clearly: just about everything is defined with strings being sequences of UTF-16 code units (though increasingly new stuff uses UTF-8), so then other things wanting to integrate have to decide how to handle that, if their view of strings is different: whether to be lossy (decode/encode using REPLACEMENT CHARACTER substitution on error), or inconvenient (use a different, non-native string type). Rust has certainly been afflicted by this in a number of cases and ways, generally favouring correctness.