[Python-Dev] len(chr(i)) = 2?
Stephen J. Turnbull
stephen at xemacs.org
Fri Nov 26 03:42:33 CET 2010
M.-A. Lemburg writes:
> That would be a possibility as well... but I doubt that many users
> are going to bother, since slicing surrogates is just as bad as
> slicing combining code points and the latter are much more common in
> real life and they do happen to mostly live in the BMP.
That's only if you require 100% fidelity in the data, which may not be
true in some use cases. Where 99.99% fidelity is good enough, an
unexpected sliced surrogate pair is a show-stopper, while a sliced
combining character sequence not only doesn't stop the show (at least
in Python, and I doubt any correct Unicode process can signal a fatal
error there either, I can put a tilde on a Cyrillic character if I
want to, no?), it's probably readable enough that readers will assume
a keypunch error.
Personally, if available I would always use some such dodge in server
software (I don't care enough about 24x7 availability to write it
myself, though). And never in a script for interactive use; something
needs fixing, may as well take the fatal error and fix it on the spot.
(Again, "on the spot" for me can mean "tomorrow".)
More information about the Python-Dev
mailing list