PEP 393 vs UTF-8 Everywhere

Pete Forman petef4+usenet at gmail.com
Fri Jan 20 19:51:53 EST 2017


MRAB <python at mrabarnett.plus.com> writes:

> As someone who has written an extension, I can tell you that I much
> prefer dealing with a fixed number of bytes per codepoint than a
> variable number of bytes per codepoint, especially as I'm also
> supporting earlier versions of Python where that was the case.

At the risk of sounding harsh, if supporting variable bytes per
codepoint is a pain you should roll with it for the greater good of
supporting users.

PEP 393 / Python 3.3 required extension writers to revisit their access
to strings. My explicit question was about why PEP 393 was adopted to
replace the deficient old implementations rather than another approach.
The implicit question is whether a UTF-8 internal representation should
replace that of PEP 393.

-- 
Pete Forman



More information about the Python-list mailing list