|Title:||Backport ssl.MemoryBIO and ssl.SSLObject to Python 2.7|
|Author:||Victor Stinner <victor.stinner at gmail.com>, Cory Benfield <cory at lukasa.co.uk>,|
|BDFL-Delegate:||Benjamin Peterson <benjamin at python.org>|
Backport the ssl.MemoryBIO and ssl.SSLObject classes from Python 3 to Python 2.7 to enhance the overall security of Python 2.7.
While Python 2.7 is getting closer to its end-of-support date (scheduled for 2020), it is still used on production systems and the Python community is still responsible for its security. This PEP will help facilitate the future adoption of PEP 543 across all supported Python versions, which will improve security for both Python 2 and Python 3 users.
This PEP does NOT propose a general exception for backporting new features to Python 2.7 - every new feature proposed for backporting will still need to be justified independently. In particular, it will need to be explained why relying on an independently updated backport on the Python Package Index instead is not an acceptable solution.
PEP 543 defines a new TLS API for Python which would enhance Python security by giving Python applications access to the native TLS implementations on Windows and macOS, instead of using OpenSSL. A side effect is that it gives access to the system trust store and certificates installed locally by system administrators, enabling Python applications to use "company certificates" without having to modify each application and so to correctly validate TLS certificates (instead of having to ignore or bypass TLS certificate validation).
For practical reasons, Cory Benfield would like to first implement an I/O-less class similar to ssl.MemoryBIO and ssl.SSLObject for PEP 543, and to provide a second class based on the first one to use sockets or file descriptors. This design would help to structure the code to support more backends and simplify testing and auditing, as well as implementation. Later, optimized classes using directly sockets or file descriptors may be added for performance.
While PEP 543 defines an API, the PEP would only make sense if it comes with at least one complete and good implementation. The first implementation would ideally be based on the ssl module of the Python standard library, as this is shipped to all users by default and can be used as a fallback implementation in the absence of anything more targetted.
If this backport is not performed, the only baseline implementation that could be used would be pyOpenSSL. This is problematic, however, because of the interaction with pip, which is shipped with CPython on all supported versions.
There are plans afoot to look at moving Requests to a more event-loop-y model. The Requests team does not feel at this time it is possible to abandon support for Python 2.7, so doing so would require using either Twisted or Tornado, or writing their own asynchronous abstraction.
For asynchronous code, a MemoryBIO provides substantial advantages over using a wrapped socket. It reduces the amount of buffering that must be done, works on IOCP-based reactors as well as select/poll based ones, and also greatly simplifies the reactor and implementation code. For this reason, Requests is disinclined to use a wrapped-socket-based implementation. In the absence of a backport to Python 2.7, Requests is required to use the same solution that Twisted does: namely, a mandatory dependency on pyOpenSSL.
The pip program has to embed all its dependencies for practical reasons: namely, that it cannot rely on any other installation method being present. Since pip depends on requests, it means that it would have to embed a copy of pyOpenSSL. That would imply substantial usability pain to install pip. Currently, pip doesn't support embedding C extensions which must be compiled on each platform and so require a C compiler.
Since Python 2.7.9, Python embeds a copy of pip both for default installation and for use in virtual environments via the new ensurepip module. If pip ends up bundling PyOpenSSL, then CPython will end up bundling PyOpenSSL. Only backporting ssl.MemoryBIO and ssl.SSLObject would avoid the need to embed pyOpenSSL, and would fix the bootstrap issue (python -> ensurepip -> pip -> requests -> MemoryBIO).
This situation is less problematic than the barrier to adoption of PEP 543, as naturally Requests does not have to move to an event loop model before it drops support for Python 2.7. However, it does make it painful for Requests (and pip) to embrace both asyncio and the async and await keywords for as long as it continues to support Python 2.
Adopting this PEP would have other smaller ecosystem benefits. For example, Twisted would be able to reduce its dependency on third-party C extensions. Additionally, the PyOpenSSL development team would like to sunset the module, and this backport would free them up to do so in a graceful manner without leaving their users in the lurch.
Each of these fringe benefits, while small, also provides value to the wider Python ecosystem.
There are some concerns that people have about this backport.
A number of the Python 2 users in the world are not keeping pace with Python 2 releases. This is most usually because they are using LTS releases that are not keeping pace with the minor releases of Python 2. These users would not be able to use the MemoryBIO, and so projects concerned with Python 2 compatibility may be unable to rely on the MemoryBIO being present on most of their user's systems.
This concern is reasonable. How critical it is depends on the likelihood of current users of Python 2 migrating to Python 3, or just trying to use the most recent Python 2 release. Put another way, at some point libraries will want to drop Python 2 support: the question is only whether a significant majority of their Python 2 users have moved to whatever Python 2 release contains this backport before they do so.
Ultimately, the authors of this PEP believe that the burden of this backport is sufficiently minimal to justify backporting despite this concern. If it turns out that migration to newer 2.7 releases is too slow, then the value of the work will be minimal, but if the migration to newer 2.7 releases is anything like reasonable then there will be substantial value gained.
Add MemoryBIO and SSLObject classes to the ssl module of Python 2.7.
The code will be backported and adapted from the master branch (Python 3).
The backport also significantly reduced the size of the Python 2/Python 3 difference of the _ssl module, which make maintenance easier.
This document has been placed in the public domain.