[Python-Dev] len(chr(i)) = 2?

Stephen J. Turnbull stephen at xemacs.org
Fri Nov 26 03:42:33 CET 2010


M.-A. Lemburg writes:

 > That would be a possibility as well... but I doubt that many users
 > are going to bother, since slicing surrogates is just as bad as
 > slicing combining code points and the latter are much more common in
 > real life and they do happen to mostly live in the BMP.

That's only if you require 100% fidelity in the data, which may not be
true in some use cases.  Where 99.99% fidelity is good enough, an
unexpected sliced surrogate pair is a show-stopper, while a sliced
combining character sequence not only doesn't stop the show (at least
in Python, and I doubt any correct Unicode process can signal a fatal
error there either, I can put a tilde on a Cyrillic character if I
want to, no?), it's probably readable enough that readers will assume
a keypunch error.

Personally, if available I would always use some such dodge in server
software (I don't care enough about 24x7 availability to write it
myself, though).  And never in a script for interactive use; something
needs fixing, may as well take the fatal error and fix it on the spot.
(Again, "on the spot" for me can mean "tomorrow".)


More information about the Python-Dev mailing list