[Python-Dev] Choosing a best practice solution for Python/extension modules

Fri Feb 20 20:44:05 CET 2009

With io getting rewritten as an extension module, I think it's time to try
to come up with a good best practice scenario for how to be able to control
when  a module uses a pure Python implementation and when it uses extension
module optimizations. This is really only important for testing as if the
extension is missing then the pure Python version is just flat-out used.

As an example, let's just go with pickle and the Pickler class, with _pickle
as the extension module.

If you look at examples in the standard library, their seems to be two
approaches. One is simply to blast the pure Python version:

    class Pickler: pass

    try: from _pickle import Pickler
    except ImportError: pass

This is bad, though, as the only way to get a pure Python version for
testing is to clear out pickle and _pickle from sys.modules, put None in for
sys.modules['_pickle'] and then import pickle again. Yuck.

The other option is to hide the pure Python version::

    class _Pickler: pass

    try: from _pickle import Pickler  # pickle actualy imports *
    except ImportError: Pickler = _Pickler

Better, but it still means that you are mucking around with hidden names and
it hard-codes what part of the module gets replaced (using import * gets
around this, but it also blasts things like __doc__ which you probably don't
want).

Now, from what I can tell, Antoine is suggesting having _pyio and a _io and
then io is simply:

    try: from _io import *
    except ImportError: from _pyio import *

That works for testing as you can then have test classes have an attribute
for the module to use and then create two subclasses which set what module
to use (kind of like how test_warnings currently does it). But this only
really works for complete module replacements, not modules like pickle where
only key portions have been rewritten (which happens more often than the
complete rewrite).

So here is my crazy idea that I came up with late last night (i.e. might not
make a lot of sense).

First, the module with the pure Python code is the main version. At the end
of that module, you make a function call: ``use_extension(__name__,
'_pickle')``. That function then does some "magic"::

def use_extension(py_name, ext_name):
    try:
        ext = importlib.import_module(ext_name)
    except ImportError:
        return
    py = sys.modules[py_name]
    swapped = {}
    for name in (x for x in dir(ext) if not x.startswith('__')):
        swapped[name] = getattr(py, name)
        setattr(py, name, getattr(ext, name))
    py.__extension__ = ext_name, swapped

You can also have an undo_extension('pickle') and it will unroll what was
changed. This makes choosing what version of a module to use very simple in
tests as it is a single function call in one direction or another. And doing
it this way also allows for different VMs to choose different things to
replace. For instance IronPython might decide that most of pickle is fine
and only want to change a single function with an extension; this solution
lets them do that without it being hard-coded in the standard library. At
worst other VMs simply need to refactor the Python code so that there is a
class or function that can be replaced.

So go ahead and tear this apart so that we can hopefully reach a consensus
that makes sense so that at least testing can easily be done.

-Brett
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090220/f255bcab/attachment-0001.htm>