Old Man Yells At Cloud

Paul Moore p.f.moore at gmail.com
Sat Sep 23 09:52:10 EDT 2017


On 23 September 2017 at 12:37, Steve D'Aprano
<steve+python at pearwood.info> wrote:
> 95% of Python is unchanged from Python 2 to 3. 95% of the remaining is a trivial
> renaming or other change which can be mechanically translated using a tool like
> 2to3. Only the remaining 5% of 5% is actually tricky to migrate. If your code
> base is full of things relying on that 5% of 5%, then you'll struggle.
> Otherwise, is probably much easier than people expect.

And in my experience, one of the worst difficulties is the transition
to "clean" Unicode handling. I've seen many Python 2 codebases that
mostly-work, either by assuming often-but-not-always-true things like
"Everyone uses UTF-8", or "subprocesses always use the same encoding
as my code" and then introduce "fixes" by tactical re-encoding rather
than redesigning the code to "decode at the boundaries" - because it's
quicker to do so, and everyone has deadlines.

In those cases, the Python 3 transition can be hard, not because
there's a lot of complexity to writing Python 3 compatible code, but
because Python 3 has a stricter separation between bytes and (Unicode)
strings, and doesn't support sloppy practices that Python 2 lets you
get away with. You *can* write Unicode-clean code in Python 2 (and
then the transition is easy) but many people don't, and that's when
things get difficult. The worst cases here are people who know how to
write good Unicode-safe code, and do so in Python 2, but using a
different approach than the one Python 3 takes. Those people put
effort into writing correct code, and then have to change that, *just*
for the transition - they don't get the correctness benefit that
others do.

(I should also point out that writing Unicode-safe code is hard, from
a *design* perspective, because an awful lot of data comes to you
without a known encoding - text files, for example, or the output of a
subprocess. Sometimes you *have* to guess, or make assumptions, and
Python 3 tends to force you to make those assumptions explicit.
Ironically, it's often the better coders that find this hard, as they
are the ones who worry about error handling, or configuration options,
rather than just picking a value and moving on).

Paul



More information about the Python-list mailing list