[Python-Dev] PEP 414 - Unicode Literals for Python 3

Tue Feb 28 18:41:24 CET 2012

On 28/02/2012 18.08, Vinay Sajip wrote:
> Ezio Melotti<ezio.melotti<at>  gmail.com>  writes:
>> For every CPython bug that I fix I first apply the patch on 2.7, then on
>> 3.2 and then on 3.3.
>> Most of the time I don't even need to change anything while applying the
>> patch to 3.2, sometimes I have to do some trivial fixes.  This is also
>> true for another personal 12kloc project* where I'm using the
>> two-branches approach.
> I hear what you say about the personal project, but IMO CPython is atypical (as
> far as this discussion is concerned), not least because it's not a pure-Python
> project.

Most of the things I fix are pure Python, I wasn't considering the C 
patches and doc fixes here.

>> For me, the costs of having two branches are:
>>    1) a one-time conversion when the Python3-compatible branch is created
>> (can be done easily with 2to3);
> Yes, but the amount of ease is project-dependent. For example, 2to3 wraps
> values() method calls with list(), which is a reasonable thing to do for dicts;
> when presented Django's querysets, which have a values() method which should not
> be wrapped, then you have to go through and sort things out. I'm not knocking
> 2to3, which I think is great. Just that things go well sometimes, and less well
> at other times,

With the personal project this is what I did:
  1) make a separate branch;
  2) run 2to3 and let it overwrite the file;
  3) review the changes as I would do with any other patch before 
committing;
  4) fix things that 2to3 missed and other minor glitches;
  5) fix a few bugs that surfaced after the port (and were in the 
original code too);

The fixes made by 2to3 were mostly:
  * removing u'' from  strings;
  * renaming imports, methods (like the .iteritems);
  * adding 'as' in the "except"s;
  * adding () for a few "print"s;

These changes affected about 500 lines of code (out of 12kloc).

The changes I did manually after running 2to3 were (some where not 
strictly necessary):
  * removing 'object' from classes;
  * removing ord() in a few places;
  * removing the content of super(...);
  * removing codecs.open() and use open() instead;
  * removing a few .decode('utf-8');
  * adding a couple of b'';

After a couple of days almost everything was working fine.

>
>> With the shared code base approach, the costs are:
>>    1) a one-time conversion to "fix" the code base and make it run on
>> both 2.x and 3.x;
>>    2) keep using and having to deal with hacks in order to keep it running.
> Which hacks do you mean, if you're only interested in 2.6+?

Things like try/except for names that changed and wrappers for 
bytes/strings.
Of course the situation is worse for projects that have to support 
earlier versions.

>
>> With the first approach, you also have two clean and separate code
>> bases, with no hacks; when you stop using Python 2, you end up with a
>> clean Python 3 branch.
>> The one-time conversion also seems easier in the first case.
>>
>> (Note: there are also other costs -- e.g. releasing -- that I haven't
>> considered because they don't affect me personally, but I'm not sure
>> they are big enough to make the two-branches approach worse.)
> I don't believe there's a one-size-fits-all. The two branches approach is
> appealing, and I have no quarrel with it: but I contend that big projects like
> Django would be reluctant to switch, or take much longer to switch to 3.x, if
> they had to maintain separate branches.

I would actually feel safer doing the port in a separate branch and keep 
it there.
Changing all the code in the main branch just to make it work for 3.x 
too doesn't strike like a really good idea to me.

>   Given the size of their user community,
> they have to follow strict release procedures, which (even with just running on
> 2.x) smaller projects can be more relaxed about.

I don't have much experience regarding releases, but developing on a 
separate branch shouldn't affect the release of the 2.x version.  The 
developers will have to merge the changes to the py3 branch too, and 
eventually they will be able to ship an additional release for py3 too.  
Sure, there's more work for the developers, but that's no news.

> You forgot to mention the part which is most time-consuming day-to-day: making
> changes and testing. For the two-branch approach, its
>
> 1. Change on 2.x
> 2. Test on 2.x
> 3. Commit on 2.x
> 4. Merge to 3.x
> 5. Possibly change on 3.x
> 6. Test on 3.x
> 7. Commit on 3.x
>
> where each "test" step, if failures occur, might take you back to a previous
> "change" step.
>
> For the single codebase, that's
>
> 1. Change
> 2. Test on 2.x
> 3. Test on 3.x
> 4. Commit

And if something fails here, you will have to repeat both step 2 and 3, 
until you get it right for both at the same time.

The step 1 of the single codebase is in the end more or less equivalent 
to steps 1+4+5, just in a different way. The remaining extra commit 
takes no time, and since the branches are independent, if you find a 
problem with py3 you don't have to run the test suite for 2.x again.

In my experience with CPython, the most time-consuming part is making 
the patch work on one of the branch in the first place.  Once it works, 
porting it to the other branches is just a mechanical step that doesn't 
really take much.
The problems during the porting arise when the two codebases diverged.
(Also keep in mind that we are not actually merging from 2.x to 3.x in 
CPython, otherwise it would be even easier.)

> This, to me, is the single big advantage of the single codebase approach, and
> the productivity improvements outweigh code purity issues which are, in the
> grand scheme of things, not all that large.

ISTM that the amount of time is pretty much the same, so I don't see 
this as a point of favor of the single codebase approach.
I might be wrong (I don't have much experience with the single codebase 
approach), but having to deal with 2+ branches never bothered me (I 
might be biased though, since I was already used to maintaining 3-4 
branches with Python).

> Another advantage is DRY: you don't have to worry about forgetting to merge some
> changes from 2.x to 3.x. Haven't we all been there one time or another? I know I
> have, though I try not to make a habit of it ;-)

I don't think it never happened to me, but I see how this could happen, 
especially in the first period after the second branch is introduced.  
Your DVCS should warn you about this though, so, at worst, you'll end up 
having to merge someone else's commit.

>
>> After the initial conversion of the code base, the fixes are mostly
>> trivial, so people don't need to write two patches (most of the patches
>> we get for CPython are either against 2.7 or 3.2, and sometimes they
>> even apply clearly to both).
> Fixes may be trivial, but new features might not always be so.

True, but especially if the feature is complicated, I would rather spend 
a bit more time and have to clean, separate versions than a single 
version that tries to work on both.

Best Regards,
Ezio Melotti

> Regards,
>
> Vinay Sajip
>