[Python-Dev] Bytes path support

Stephen J. Turnbull stephen at xemacs.org
Sat Aug 23 12:14:47 CEST 2014


Oleg Broytman writes:

 >    This is the core of the problem. Python2 favors Unix model but
 > Windows people pays the price. Python3 reverses that

This is certainly not true.  What is true is that Python 3 makes no
attempt to make it easy to write crappy software in the old Unix
style, that breaks when unexpected character encoding are encountered.
Python 3 is designed to make it easier to write reliable software,
even if it will only ever be used on one platform.  Nevertheless, it's
still a reasonable language for writing byte-shoveling software, with
the last piece in place as of the acceptance of PEP 461.

As of that PEP, you can use regexps for tokenizing byte streams and
%-formatting to conveniently produce them.  If you want to treat them
piecewise as character streams with different encodings, you have a
large library of codecs, which provide an incremental decoder
interface.  While AFAIK no codec implements a decode-until-error mode,
that's not all that much of a loss, as many encodings overlap.  Eg, if
you start decoding using a latin-1 codec, decoding the whole document
will succeed, even if it switches to windows-1251 in the meantime.

Oleg, I gather Russian is your native language.  That's moderately
complicated, I admit.  But the Russians are a distant second to the
Japanese in self-destructive proliferation of incompatible character
coding standards and non-standard variants.  After 24 years of dealing
with the mess that is East Asian encodings (which is even bound up
with the "religion" of Japanese exceptionalism -- some Japanese have
argued that there is a spiritual superiority to Japanese JIS codes!),
I cannot believe you are going to find a better environment for
dealing with these issues than Python 3.




More information about the Python-Dev mailing list