[Python-Dev] PEP 364, Transitioning to the Py3K standard library

Brett Cannon brett at python.org
Wed Mar 7 01:09:58 CET 2007


On 3/6/07, Barry Warsaw <barry at python.org> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Here's a new PEP that's the outgrowth of work Brett Cannon and I did
> at PyCon.  Basically we were looking for a way to allow for forward
> compatibility with PEP 3108, which describes a reorganization of the
> standard library.  PEP 364 is for Python 2.x -- it doesn't say what
> the reorganized standard library should look like (that's still for
> PEP 3108 to decide), but it provides a mechanism whereby for renamed
> modules, both the old and new module names can live side-by-side.
>
> Interestingly enough, I was able to remove the one-off hack in the
> email package for its PEP 8-compliant renaming by using the code in
> the reference implementation for PEP 364.
>
> Here's the text of the PEP, which should be available at http://
> www.python.org/dev/peps/pep-0364/ shortly.
>
> - -Barry
>
> PEP: 364
> Title: Transitioning to the Py3K Standard Library
> Version: $Revision$
> Last-Modified: $Date$
> Author: Barry A. Warsaw <barry at python.org>
> Status: Active
> Type: Standards Track
> Content-Type: text/x-rst
> Created: 01-Mar-2007
> Python-Version: 2.6
> Post-History:
>
>
> Abstract
> ========
>
> PEP 3108 describes the reorganization of the Python standard library
> for the Python 3.0 release [1]_.  This PEP describes a
> mechanism for transitioning from the Python 2.x standard library to
> the Python 3.0 standard library.  This transition will allow and
> encourage Python programmers to use the new Python 3.0 library names
> starting with Python 2.6, while maintaining the old names for backward
> compatibility.  In this way, a Python programmer will be able to write
> forward compatible code without sacrificing interoperability with
> existing Python programs.
>
>
> Rationale
> =========
>
> PEP 3108 presents a rationale for Python standard library (stdlib)
> reorganization.  The reader is encouraged to consult that PEP for
> details about why and how the library will be reorganized.  Should
> PEP 3108 be accepted in part or in whole, then it is advantageous to
> allow Python programmers to begin the transition to the new stdlib
> module names in Python 2.x, so that they can write forward compatible
> code starting with Python 2.6.
>
> Note that PEP 3108 proposes to remove some "silly old stuff",
> i.e. modules that are no longer useful or necessary.  The PEP you are
> reading does not address this because there are no forward
> compatibility issues for modules that are to be removed, except to
> stop using such modules.
>
> This PEP concerns only the mechanism by which mappings from old stdlib
> names to new stdlib names are maintained.  Please consult PEP 3108 for
> all specific module renaming proposals.  Specifically see the section
> titled ``Modules to Rename`` for guidelines on the old name to new
> name mappings.  The few examples in this PEP are given for
> illustrative purposes only and should not be used for specific
> renaming recommendations.
>
>
> Supported Renamings
> ===================
>
> There are at least 4 use cases explicitly supported by this PEP:
>
> - - Simple top-level package name renamings, such as ``StringIO`` to
>    ``stringio``;
>
> - - Sub-package renamings where the package name may or may not be
>    renamed, such as ``email.MIMEText`` to ``email.mime.text``;
>
> - - Extension module renaming, such as ``cStringIO`` to ``cstringio``;
>

This feels a little misleading.  The renaming is not restricted to
extension modules but also works with any top-level module.

> - - Third party renaming of any of the above.
>
> Two use cases supported by this PEP include renaming simple top-level
> modules, such as ``StringIO``, as well as modules within packages,
> such as ``email.MIMEText``.
>
> In the former case, PEP 3108 currently recommends ``StringIO`` be
> renamed to ``stringio``, following PEP 8 recommendations [2]_.
>
> In the latter case, the email 4.0 package distributed with Python 2.5
> already renamed ``email.MIMEText`` to ``email.mime.text``, although it
> did so in a one-off, uniquely hackish way inside the email package.
> The mechanism described in this PEP is general enough to handle all
> module renamings, obviating the need for the Python 2.5 hack (except
> for backward compatibility with earlier Python versions).
>
> An additional use case is to support the renaming of C extension
> modules.  As long as the new name for the C module is importable, it
> can be remapped to the new name.  E.g. ``cStringIO`` renamed to
> ``cstringio``.
>
> Third party package renaming is also supported, via several public
> interfaces accessible by any Python module.
>

I guess a .pth file could install the mappings for the third-party modules.

> Remappings are not performed recursively.
>
>
> .mv files
> =========
>
> Remapping files are called ``.mv`` files; the suffix was chosen to be
> evocative of the Unix mv(1) command.  An ``.mv`` file is a simple
> line-oriented text file.  All blank lines and lines that start with a
> # are ignored.  All other lines must contain two whitespace separated
> fields.  The first field is the old module name, and the second field
> is the new module name.  Both module names must be specified using
> their full dotted-path names.  Here is an example ``.mv`` file from
> Python 2.6::
>
>      # Map the various string i/o libraries to their new names
>      StringIO    stringio
>      cStringIO   cstringio
>
> ``.mv`` files can appear anywhere in the file system, and there is a
> programmatic interface provided to parse them, and register the
> remappings inside them.  By default, when Python starts up, all the
> ``.mv`` files in the ``oldlib`` package are read, and their remappings
> are automatically registered.  This is where all the module remappings
> should be specified for top-level Python 2.x standard library modules.
>

I personally prefer to just have this info in a dict with a specified
format for the dict.  I think that should at least be an Open Issue or
address why a new file format was chosen over that solution (don't
remember what you said at PyCon  =).

>
> Implementation Specification
> ============================
>
> This section provides the full specification for how module renamings
> in Python 2.x are implemented.  The central mechanism relies on
> various import hooks as described in PEP 302 [3]_.  Specifically
> ``sys.path_importer_cache``, ``sys.path``, and ``sys.meta_path`` are
> all employed to provide the necessary functionality.
>
> When Python's import machinery is initialized, the oldlib package is
> imported.  Inside oldlib there is a class called ``OldStdlibLoader``.
> This class implements the PEP 302 interface and is automatically
> instantiated, with zero arguments.  The constructor reads all the
> ``.mv`` files from the oldlib package directory, automatically
> registering all the remappings found in those ``.mv`` files.  This is
> how the Python 2.x standard library is remapped.
>
> The OldStdlibLoader class should not be instantiated by other Python
> modules.  Instead, you can access the global OldStdlibLoader instance
> via the ``sys.stdlib_remapper`` instance.  Use this instance if you want
> programmatic access to the remapping machinery.
>
> One important implementation detail: as needed by the PEP 302 API, a
> magic string is added to sys.path, and module __path__ attributes in
> order to hook in our remapping loader.  This magic string is currently
> ``<oldlib>`` and some changes were necessary to Python's site.py file
> in order to treat all sys.path entries starting with ``<`` as
> special.  Specifically, no attempt is made to make them absolute file
> names (since they aren't file names at all).
>

Probably should mention that sys.path_importer_cache is then set for
the magic string.

> In order for the remapping import hooks to work, the module or package
> must be physically located under its new name.  This is because the
> import hooks catch only modules that are not already imported, and
> cannot be imported by Python's built-in import rules.  Thus, if a
> module has been moved, say from Lib/StringIO.py to Lib/stringio.py,
> and the former's ``.pyc`` file has been removed, then without the
> remapper, this would fail::
>
>      import StringIO
>
> Instead, with the remapper, this failing import will be caught, the
> old name will be looked up in the registered remappings, and in this
> case, the new name ``stringio`` will be found.  The remapper then
> attempts to import the new name, and if that succeeds, it binds the
> resulting module into sys.modules, under both the old and new names.
> Thus, the above import will result in entries in sys.modules for
> 'StringIO' and 'stringio', and both will point to the exact same
> module object.
>
> Note that no way to disable the remapping machinery is proposed, short
> of moving all the ``.mv`` files away or programmatically removing them
> in some custom start up code.  In Python 3.0, the remappings will be
> eliminated, leaving only the "new" names.
>

There is no mention of what the sys.meta_path importer/loader is used for.

>
> Programmatic Interface
> ======================
>
> Several methods are added to the ``sys.stdlib_remapper`` object, which
> third party packages can use to register their own remappings.

Is this really necessary?  It might be helpful if this use of mapping
old to new names gets picked up and used heavily, but otherwise this
might be overkill.  You can easily get the object from
sys.path_importer_cache as long as you know the magic string used on
sys.path.

>  Note
> however that in all cases, there is one and only one mapping from an
> old name to a new name.  If two ``.mv`` files contain different
> mappings for an old name, or if a programmatic call is made with an
> old name that is already remapped, the previous mapping is lost.  This
> will not affect any already imported modules.
>
> The following methods are available on the ``sys.stdlib_remapper``
> object:
>
> - - ``read_mv_file(filename)`` -- Read the given file and register all
>    remappings found in the file.
>
> - - ``read_directory_mv_files(dirname, suffix='.mv')`` -- List the given
>    directory, reading all files in that directory that have the
>    matching suffix (``.mv`` by default).  For each parsed file,
>    register all the remappings found in that file.
>
> - - ``set_mapping(oldname, newname)`` -- Register a new mapping from an
>    old module name to a new module name.  Both must be the full
>    dotted-path name to the module.  newname may be ``None`` in which
>    case any existing mapping for oldname will be removed (it is not an
>    error if there is no existing mapping).
>
> - - ``get_mapping(oldname, default=None)`` -- Return any registered
>    newname for the given oldname.  If there is no registered remapping,
>    default is returned.
>

If you are going to have the object accessible from outside
sys.path_importer_cache I would also add an attribute that contains
the magic string used on sys.path.



Thanks for writing this up, Barry!  I am just glad our idea for this
actually worked in the end.  =)

-Brett


More information about the Python-Dev mailing list