[Python-Dev] PEP 414 - Unicode Literals for Python 3

Tue Feb 28 15:20:46 CET 2012

On 28/02/2012 14.19, Antoine Pitrou wrote:
> Le mardi 28 février 2012 à 22:14 +1000, Nick Coghlan a écrit :
>> If you're using separate branches, then your Python 2 code isn't being
>> made forward compatible with Python 3. Yes, it avoids making your
>> Python 2 code uglier, but it means maintaining two branches in
>> parallel until you drop Python 2 support.
> IMO, maintaining two branches shouldn't be much more work than
> maintaining hacks so that a single codebase works with two different
> programming languages.

+10

For every CPython bug that I fix I first apply the patch on 2.7, then on 
3.2 and then on 3.3.
Most of the time I don't even need to change anything while applying the 
patch to 3.2, sometimes I have to do some trivial fixes.  This is also 
true for another personal 12kloc project* where I'm using the 
two-branches approach.

For me, the costs of having two branches are:
  1) a one-time conversion when the Python3-compatible branch is created 
(can be done easily with 2to3);
  2) merging the fix I apply to the Python2 branch (and with modern DVCS 
this is not really an issue).

With the shared code base approach, the costs are:
  1) a one-time conversion to "fix" the code base and make it run on 
both 2.x and 3.x;
  2) keep using and having to deal with hacks in order to keep it running.

With the first approach, you also have two clean and separate code 
bases, with no hacks; when you stop using Python 2, you end up with a 
clean Python 3 branch.
The one-time conversion also seems easier in the first case.

(Note: there are also other costs -- e.g. releasing -- that I haven't 
considered because they don't affect me personally, but I'm not sure 
they are big enough to make the two-branches approach worse.)

>
>> You've once again raised the
>> barrier to entry: either people contribute two patches, or they accept
>> that their patch may languish until someone else writes the patch for
>> the other version.
> Again that's wrong. If you cleverly use 2to3 to port between branches,
> patches only have to be written against the 2.x version.

After the initial conversion of the code base, the fixes are mostly 
trivial, so people don't need to write two patches (most of the patches 
we get for CPython are either against 2.7 or 3.2, and sometimes they 
even apply clearly to both).

Using 2to3 to generate the 3.x code automatically for every change 
applied to the 2.x branch (or to convert everything when a new package 
is installed) sounds wrong to me.  I wouldn't trust generated code even 
if 2to3 was a better tool.

That said, I successfully used the shared code base approach with 
print_function, unicode_literals, and a couple of try/except for the 
imports for a few one-file scripts (for 2.7/3.2) that I wrote recently.

TL;DR the two-branches approach usually works better (at least IME) than 
the shared code base approach, doesn't necessarily require more work, 
and doesn't need ugly hacks to work.

* in this case all the string literals I had were already text (rather 
than bytes) and even without using unicode_literals they worked out of 
the box when I moved the code to 3.x.  There was however a place where 
it didn't work, and that turned out to be a bug even in Python 2 because 
I was mixing bytes and text.

Best Regards,
Ezio Melotti

> Regards
>
> Antoine.