[Python-Dev] PEP 414

Vinay Sajip vinay_sajip at yahoo.co.uk
Thu Mar 1 20:00:57 CET 2012


Guido van Rossum <guido <at> python.org> writes:

> I noticed there were some complaints about unnecessarily offensive
> language in PEP 414. Have those passages been edited to everyone's
> satisfaction?

I'm not sure if Nick has finished his updates, but I for one would like to see
some improvements in a few places:

"Many thought that the unicode_literals future import might make a common source
possible, but it turns out that it's doing more harm than good."

Rather than talking about it doing more harm than good, it would be better to
say that unicode_literals is not the best solution in some scenarios
(specifically, WSGI, but any other scenarios can also be mentioned). The "more
harm than good" is not true in all scenarios, but as it's worded now, it seems
like it is always a bad approach.

"(either by having a u function that marks things as unicode without future
imports or the inverse by having a n function that marks strings as native).
Unfortunately, this has the side effect of slowing down the runtime performance
of Python and makes for less beautiful code."

The use of u() and n() are not equivalent in the sense that n() only has to be
used when unicode_literals are in effect, and the incidence of n() calls in an
application would be much lower than using u() in the absence of
unicode_literals. In at least some cases, it is possible that some of the APIs
which fail unless native strings are provided may be broken (e.g. some database
adapters expect datetimes in ISO format as native strings, where there is no
apparent reason why they couldn't accept them as text).

As far as "less beautiful" code is concerned, it's subjective: I see nothing
especially ugly about 'xxx' for text, and certainly don't find u'xxx' "more"
beautiful - and I doubt if I'm the only person with that view. The point about
the added cognitive burden of semantic-changing __future__ imports is, however,
quite valid.

"As it stands, when chosing between 2.7 and Python 3.2, Python 3 is currently
not the best choice for certain long-term investments, since the ecosystem is
not yet properly developed, and libraries are still fighting with their API
decisions for Python 3."

This looks to become a self-fulfilling prophecy, if you take it seriously. You
would expect that, if Python 3 is the future of Python, then Python 3 is
*precisely* the choice for *long*-term investments. The ecosystem is not yet
fully developed, true: but that is because some people aren't ready to grasp the
nettle and undergo the short-term pain required to get things in order. By
"things", I mean places in existing 2.x code where no distinction was made
between bytes and text, which you could get away with because of 2.x's forgiving
nature. Whether you're using unicode_literals and 'xxx' or u'xxx', these things
will need to be sorted out, and the syntax element is only one possible focus.

If that entire sentence is removed, it does the PEP no harm, and the PEP will
antagonise fewer people.

"A valid point is that this would encourage people to become dependent on Python
3.3 for their ports. Fortunately that is not a big problem since that could be
fixed at installation time similar to how many projects are currently invoking
2to3 as part of their installation process."

Yes, but avoiding the very pain of running 2to3 is what (at least in part)
motivates the PEP in the first place. This appears to be moving the pain that
2.x developers feel when trying to move to 3.x, to people who want to support
3.2 and 3.3 and 2.6+ in the same codebase.

"For Python 3.1 and Python 3.2 (even 3.0 if necessary) a simple on-installation
hook could be provided that tokenizes all source files and strips away the
otherwise unnecessary u prefix at installation time."

There's some confusion about this hook - The PEP calls it an on-installation
hook (like 2to3) but Nick said it was an import-time hook. I'm more comfortable
with the latter - it has a chance of providing an acceptable performance for a
large codebase, as it will only kick in when .py files are newer than their
.pyc. A 2to3 like hook, when working with a large codebase like Django, is
likely to be about as painful as people are finding 2to3 now (when used in an
edit-test-edit-test workflow).

"Possible Downsides" does not mention any possible adverse impact on single
codebase for 3.2/3.3, which I mention only because it's still not clear how the
hook which is to make 3.2 development easier will work (in terms of its impact
on development workflow).

In the section on "Modernizing code",

"but to make strings cheap for both 2.x and 3.x it is nearly impossible. The way
it currently works is by abusing the unicode-escape codec on Python 2.x native
strings."

IIUC, the unicode-escape codec is only needed if you don't use unicode_literals
- am I wrong about that? How are strings not equally cheap (near enough) on 2.x
and 3.x if you use unicode_literals?

In the "Runtime overhead of wrappers", the times may be valid, but a rider
should be added to the effect that in a realistic workload, the wrapper overhead
will be somewhat diluted where wrapper calls are fairly infrequent (i.e. the
unicode_literals and n() case).

Of course, if the PEP is targeting Python 2.5 and earlier where unicode_literals
is not available, then it should say so. I would say that the overall impression
given by the PEP is that "the unicode_literals approach is not worth bothering
with", and that I do not find to be true based on my own experience.

Regards,

Vinay Sajip



More information about the Python-Dev mailing list