[Python-Dev] Migration from Python 2.7 and bytes formatting

Sun Jan 19 03:27:12 CET 2014

Neil Schemenauer writes:

 > That's it.  After sleeping on it, I'm not sure that's enough Python
 > 2.x compatibility to help a lot.  I haven't ported much code to 3.x
 > yet but I imagine the following are major challenges:
 > 
 > - comparisons between str and bytes always returns unequal
 > 
 > - indexing/iterating bytes returns integers, not bytes objects
 > 
 > - concatenation of str and bytes fails (not so bad since
 >   a TypeError is generated right away).

Experience shows these are rarely major challenges.  The reason we are
having this discussion is that if you are the kind of programmer who
runs into challenges once, you are likely to run into all of the above
and more, repeatedly, and addressing them using features available in
Python up to v3.3 make your code unreadable.

In other words, it's like unemployment at 5%.  It would be bearable
(just) if the pain were shared by 100% of the people being 5%
unemployed, but rather the burden falls on the 5% who are 100%
unemployed.

Now, the problem that many existing libraries face is that they were
designed for monolingual environments where text encodings are more or
less ASCII compatible[1].  If you stay in the Python 2 world, you can
"internationalize" with the existing design, more or less limp along,
fixing encoding bugs as they arise (not "if" but "when", and it can
take a decade to find them all).  But Python 3 *strongly* discourages
that policy.  From the point of view of design for the modern
environment, such libraries really should have their I/O modules
rewritten from scratch (not a huge job), and the necessary adjustments
made in processing code (few but randomly dispersed through the code,
and each a ticking time bomb for your users).  But I stress that the
problem here is that the design of such libraries is at fault, not
Python 3.  The world has changed.[2]

And then there are the remaining 5% or so that really need to work
mostly in bytes, but want to use string formatting to format their
byte streams.  I used to think that this was just a porting
convenience, but I was wrong.  Code written this way is often more
concise and more readable than code written using .join() or the
struct module.  It *should* be written using string formatting.  And
that's what PEPs 460 and 461 are intended to address.

We'll see what happens as these PEPs are implemented, but I suspect
that we'll find that there are very few bandaids left that are of much
use.  That is, as I claimed above, for the remaining problematic
libraries a redesign will be needed.

Footnotes: 
[1]  In the technical sense that you can rely on ASCII bytes to mean
ASCII characters, not part of a non-ASCII character.

[2]  And if the world *hasn't* changed for your application, what's
wrong with staying with Python 2?