PEP 393 vs UTF-8 Everywhere

Pete Forman petef4+usenet at gmail.com
Fri Jan 20 17:35:02 EST 2017


Can anyone point me at a rationale for PEP 393 being incorporated in
Python 3.3 over using UTF-8 as an internal string representation? I've
found good articles by Nick Coghlan, Armin Ronacher and others on the
matter. What I have not found is discussion of pros and cons of
alternatives to the old narrow or wide implementation of Unicode
strings.

ISTM that most operations on strings are via iterators and thus agnostic
to variable or fixed width encodings. How important is it to be able to
get to part of a string with a simple index? Just because old skool
strings could be treated as a sequence of characters, is that a reason
to shoehorn the subtleties of Unicode into that model?

-- 
Pete Forman



More information about the Python-list mailing list