Python 2 to 3 conversion - embrace the pain

John Nagle nagle at animats.com
Fri Mar 13 17:08:15 EDT 2015


  I'm approaching the end of converting a large system from Python 2 to
Python 3.  Here's why you don't want to do this.

  The language changes aren't that bad, and they're known and
documented.  It's the package changes that are the problem.
Discovering and fixing all the new bugs takes a while.


BeautifulSoup:

BeautifulSoup 3 has been phased out. I had my own version of
BeautifulSoup 3, modified for greater robustness.  But that was
years ago.  So I converted to BeautifulSoup 4, as the documentation
says to do.

The HTML5parser module is claimed to parse as a browser does, with
all the error tolerance specified in the HTML5 spec. (The spec
actually specifies how to handle bad HTML consistently across
browsers in great detail, and HTML5parser has code in it for that.)

It doesn't deliver on that promise, though. Some sites crash
BeautifulSoup 4/HTML5parser.  Try "kroger.com", which has HTML with
<head><head>.  The parse tree constructed has a bad link,
and trying to use the parse tree results in exceptions.
Submitted bug report.  Appears to be another case of
a known bug.  No workaround at this time.

https://bugs.launchpad.net/beautifulsoup/+bug/1270611
https://bugs.launchpad.net/beautifulsoup/+bug/1430633


PyMySQL:

"Pymysql is a pure Python drop-in replacement for MySQLdb".
Sounds good.  Then I discover that LOAD DATA LOCAL wasn't
implemented in the version on PyPi.  It's on Github, though,
and I got the authors to push that out to PyPi.  It
works on test cases.  But it doesn't work on a big job,
because the default size of MySQL packets was set to 16MB.
This made the LOAD DATA LOCAL code try to send the entire
file being loaded as one giant MySQL packet.  Unless you
configure the MySQL server with 16MB buffers, this fails, with
an obscure "server has gone away" message.  Found the
problem, came up with a workaround, submitted a bug report,
and it's being fixed.

https://github.com/PyMySQL/PyMySQL/issues/317


SSL:

All the new TLS/SSL support is in Python 3. That's good.
Unfortunately, using Firefox's set of SSL certs, some
important sites (such as "verisign.com") don't validate.
This turned out to be a complex problem involving Verisign
cross-signing a certificate, which created a certificate
hierarchy that some versions of OpenSSL can't handle.
There's now a version of OpenSSL that can handle it, but
the Python library has to make a call to use it, and
that's going in but isn't deployed yet.  This bug
resulted in much finger-pointing between the Python
and OpenSSL developers, the Mozilla certificate store
maintainers, and Verisign.  It's now been sorted out,
but not all the fixes are deployed.  Because "ssl" is
a core Python module, this will remain broken until the
next Python release, on both the 2.7 and 3.4 lines.

Also, for no particularly good reason, the exception
"SSL.CertificateError" is not a subclass of "SSL.Error",
resulting in a routine exception not being recognized.

Bug reports submitted for both OpenSSL and Python SSL.
Much discussion.  Problem fixed, but fix is in next
version of Python.  No workaround at this time.

http://bugs.python.org/issue23476


Pickle:

As I just posted recently, CPickle on Python 3.4 seems to
have a memory corruption bug.  Pure-Python Pickle is fine.
So a workaround is possible.  Bug report submitted.

http://bugs.python.org/issue23655


Converting a large application program to Python 3
thus required diagnosing four library bugs and filing
bug reports on all of them.  Workarounds are known
for two of the problems.  I can't deploy the Python 3
version on the servers yet.

				John Nagle



More information about the Python-list mailing list