Python 2 to 3 conversion - embrace the pain

Tue Mar 17 07:26:58 EDT 2015

On Tue, 17 Mar 2015 09:36 am, Paul Rubin wrote:

> Steven D'Aprano <steve+comp.lang.python at pearwood.info> writes:
[...]
>> If we were designing Python from scratch today, here are some of the
>> changes we would certainly make: [mostly good changes]
> 
> I agree with most of those changes and I'd add some of my own that are
> even more radical.  But, they shouldn't be made in dribs and drabs in
> the production releases: it's better to make a fork of the language that
> does all the changes at the same time.

You mean like Python 3?

> Think of C++ as a fork of C in 
> that manner, although it suffered a lot from trying to be a superset.

C++ is hardly a fork. It is a distinct C-inspired language, although it is
intended to be completely C-compatible.

>> If only we could turn back the clock and redesign it with the benefit
>> of hindsight, without worrying about backwards compatibility!
> 
> We are currently getting the disadvantages of both approaches.  We break
> stuff enough to inflict pain on users, without bringing enough benefits
> to make them want to accept the pain.
> 
>> design fixes all in one lump or spread out of twenty versions, you
>> still have to break backwards compatibility, or keep carrying the
>> obsolete code forever.
> 
> If people are still using it then it's not obsolete.  Code is only
> obsolete when the last user is dead.

That's one way of looking at it, but I don't think a good way. There are
still people using Python 1.5, some in production, some for
experimentation. That doesn't mean that 1.5 isn't obsolete, and it
certainly doesn't imply that the core developers have an obligation to
continue maintaining it.

(I'm one of the later, I keep a copy of 1.5 installed so I can check what
features were around back then and how they have changed.)

>> Of course, not *all* C code written in the 1980s still works fine,
> 
> Can you give an example of some way C changed that broke old code that
> was written in accordance with the language manuals?  Of course tons of
> C code then and now used implementation-dependent hacks that went
> outside the documented behavior, but that doesn't count.

Why shouldn't it count?

People who really take backwards compatibility seriously make even their
undocumented behaviour backwards compatible, sometimes to astonishing
degrees:

    The testers on the Windows team were going through various 
    popular applications, testing them to make sure they worked
    OK, but SimCity kept crashing. They reported this to the
    Windows developers, who disassembled SimCity, stepped through
    it in a debugger, found the bug, and added special code that
    checked if SimCity was running, and if it did, ran the memory
    allocator in a special mode in which you could still use
    memory after freeing it.

http://www.joelonsoftware.com/articles/APIWar.html

As for C, there are a few backwards incompatible changes over the years. One
of my favourites is that the += operator was originally spelled =+ copying
Ken Thompson's earlier language "B". That was changed in 1976.

http://cm.bell-labs.com/who/dmr/chist.html

Although perhaps it is unfair to include changes so far back in time, before
C was an ISO standard.

It is true that C has been remarkably backwards compatible, particularly
compared to the popular languages of the 1970s with their oodles of
proprietary extension, but it has changed. New keywords of course break
code that relied on those words not being keywords. So much of the language
is optional or implementation dependent or undefined that there is lots of
code which will compile but not necessarily do the same thing on two
different systems. As Dennis Richie says (link above):

    There are differing dialects of C—most noticeably, those described
    by the older K&R and the newer Standard C—but on the whole, C has
    remained freer of proprietary extensions than other languages.

Post-standardisation, here are a few other backwards incompatibilities:

- C99 introduced variable-length arrays, but C11 relegated it to an optional
feature which compilers are not required to support;

- C99 changed the behaviour of floating point operations;

- C11 finally removed `gets`.

There are probably others.

But the biggest problem is that C compilers vary (sometimes greatly) in what
subsets of the language they actually support. What works in compiler A may
not even compile on compiler B, let alone behave the same.

Nevertheless, I acknowledge that there is a continuum of "backwards
compatibility strictness", with explicitly unstable languages at one end,
PHP closer to that side, Python more in the middle, and ISO standardised
languages like Fortran and C at the other end.

>> but your point is taken. C, like Fortran and Java, have an even
>> stricter view of backward compatibility than Python.
> 
> Right, they take their deployed base seriously, while Python has been
> more like an experimental language by comparison.
> 
>> The downside of that is that learning and using C means you have to
>> know and learn a whole lot of ugly, dirty cruft, if only so you know
>> what it means when you see it in somebody else's code. Long after
>> feature X has been made obsolete, people still need to deal with it,
>> *even if nobody uses it*.
> 
> Can you give an example?  I wouldn't count things like gets, which
> aren't as much changes in the language, as recognition that using it was
> buggy from the start.

That's exactly the point. `gets` is dangerous and needs to die with extreme
prejudice, and is an extremely strong argument in favour of breaking
backwards compatibility. "Oops, I misspelled referer" is embarrassing but
ultimately harmless. Fixing that bug is a Nice To Have. But removal of
mistakes like `gets` is *critical*, and the failure of C to do so for so
long is why "written in C" is so often a synonym for "security holes up to
your wossname".

>> And they will. We both know that no matter how bad and foolish a
>> feature is, no matter how loudly you tell people not to use this
>> deprecated, insecure, unsafe feature, until it is actually gone people
>> will continue to write new code using it.
> 
> I haven't noticed that much in C.  Maybe it's more of an issue with C++.
> 
>> You can't please everybody. Some people want Python to change even
>> faster, some want Python to stop changing so fast...
> 
> I don't notice anyone wanting Python to change faster, in the sense of
> breaking existing code more often.  Maybe there are some specific
> proposals that I missed.

Are you on the python-ideas list?

> Do you know if any of the big Python shops (Google maybe?) are using
> Python 3 these days?  Have they migrated Python 2 codebases or are they
> using both 2 and 3?  I don't personally know (i.e. in person, not
> online) anyone using Python 3.  Everyone is still using Python 2.
> E.g. I think Twitter was using Python 2 a year or so ago (don't know
> about now).  I believe they still use Python for some things but have
> gravitated production systems towards Scala.

There are certainly people using 3, some of them have spoken up about it
here, but 2 is still far more popular. That's only to be expected: after
almost a decade of development, the Python 3 ecosystem is now in a fit
state that people should prefer it for new projects. It will be a long time
before Python 2 is completely gone (although perhaps not as long as some
people expect).

Anyone remember the big backwards incompatible changes made to Visual Basic?
How long did that take to settle down afterwards?

-- 
Steven