[Python-3000] (PEP 3000) Rethinking 2to3, __future__, and the migration path.

Charles Merriam charles.merriam at gmail.com
Sat Mar 22 01:02:29 CET 2008


I hate bringing up something that has been hashed over so many times,
but I'm a bear
of little brain and am not understanding the migration path.  The
whole use of the "2to3"
tools seems like an abrupt hack.  It is workable, but causes a serious
plan for near term
2.x to 3.x migration.   I'd like to present a different view.  Instead
of just yelling how it
must be wrong, consider it.  It's not far from the current migration
path and is less rocky.

Moving from 2.6 to 3.0 appears to encompass the following major tasks,
and some minor
ones:

1.  Change of Unadorned strings from default 'b' to default like the
(now illegal)  'u'
2.  Changed print becomes a function
3.  Changed meaning of keys(), items(), and values() changed
4.  Removed default ordering operators, string exceptions, old classes, etc.
5.  Changed repr() for Long.
6.  Completely predictable syntax modifications for Raise, Catch, etc.
etc., etc.

This is a lot for my head, and even a tool to translate is going to
cause odd warnings and
syntax errors everywhere.  It's especially hard as the advantages to
using 3.x are not
immediately realized upon conversion.

Now, for me this is (hopefully) academic:  I'm hoping not to port BLs
(metric) of code.  Still,
my consideration for programmers  would much prefer this sequence that
could, with changes
to 2.6, be done one module at a time in a way that converted and
unconverted code could live
side by side.  Over time, more of the transitional elements would be
'required' in my  build
environment.  That is, I would want to make my changes gradually and
have a reasonable
expectation that the first time I ran 2to3 would also be my last.

The biggest and most thoughtful cost is carefully looking at every
single string in my application
and deciding if it should be bytes or unicode, and adding the 'b' or
'u' adornment appropriately.
This would take an immense amount of time,  requiring rethinking APIs,
duplicating modules
when needing both byte and unicode processing, etc.  It's something to
do 'later'.  I expect a
transition period of adorning the strings whenever working on the code
anyway.  Eventually, I
would enable the Python warning about unadorned strings to catch the
last cases and times I
pull executable code from a file, database, or socket.

I would start banning deprecated and changed names, e.g., keys(),
range(), input() and the like.
When it makes sense,  I might use iter_keys(), xrange(), or
raw_input().  When it doesn't, I would
explicitly call functions,  not use idioms, to perform the old tasks:
range_as_list() or
input_with_eval().   Again the pattern of a period of  gradual change,
then pulling  the compiler
switch to warn about old usage.

I would start using the handy new printing function, named print3(),
instead of the print
statement.

Then I might, one day, switch to running 2.6 in the mode that warns
about old style classes,
default comparators, string  exceptions, and the like.   Likely, I
would use the fine grained control
to turn those warnings on one at a time over a  period of weeks or months.

Once there, I could confidently run 2to3 to build a 3.x version.  I
would do the same amount of
work, but spread my risk along a much longer schedule with more
options.  I would save time by
using serendipity when already maintaining code.   My 2.6 version
would always interact correctly
between modules which have been upgraded and those that have not.

In many ways, I'm not saying anything new.  Without fine grained
control in Python, I would expect
coders to start running the "-3" option or running "2to3" occasionally
to see what would need to
be fixed later in the module which has just been opened up.   With the
fine grained control, this
could be managed.   About 80% of the way could be done by judicial
control of static analyzers,
such as PyChecker or PyLInt.  100% would require options in Python.

Overall, the 2.x to 3.x transition needs to be fairly painless.  The
only immediate features for
developers are better unicode processing.  The other features are all
items that will be in 2.6
eventually, or have yet to be shown as really useful.    In a perfect
world, the gradual transition
looks like the more like the transition of C code to ANSI-C code to
ANSI-C compiled in a C++
compiler and less like the Perl leaps between version.

Once ready, the transition from 2.7 to 3.0 would be mechanical.


More information about the Python-3000 mailing list