[Python-checkins] r77829 - in peps/trunk: pep-3147-1.dia pep-3147-1.png pep-3147.txt

barry.warsaw python-checkins at python.org
Fri Jan 29 21:37:07 CET 2010


Author: barry.warsaw
Date: Fri Jan 29 21:37:07 2010
New Revision: 77829

Log:
PEP 3147, PYC Repository Directories, Warsaw



Added:
   peps/trunk/pep-3147-1.dia   (contents, props changed)
   peps/trunk/pep-3147-1.png   (contents, props changed)
   peps/trunk/pep-3147.txt

Added: peps/trunk/pep-3147-1.dia
==============================================================================
Binary file. No diff available.

Added: peps/trunk/pep-3147-1.png
==============================================================================
Binary file. No diff available.

Added: peps/trunk/pep-3147.txt
==============================================================================
--- (empty file)
+++ peps/trunk/pep-3147.txt	Fri Jan 29 21:37:07 2010
@@ -0,0 +1,379 @@
+PEP: 3147
+Title: PYC Repository Directories
+Version: $Revision$
+Last-Modified: $Date$
+Author: Barry Warsaw <barry at python.org>
+Status: Draft
+Type: Standards Track
+Content-Type: text/x-rst
+Created: 2009-12-16
+Python-Version: 3.2
+Post-History:
+
+
+Abstract
+========
+
+This PEP describes an extension to Python's import mechanism which
+improves sharing of Python source code files among multiple installed
+different versions of the Python interpreter.  It does this by
+allowing many different byte compilation files (.pyc files) to be
+co-located with the Python source file (.py file).  The extension
+described here can also be used to support different Python
+compilation caches, such as JIT output that may be produced by an
+Unladen Swallow [1]_ enabled C Python.
+
+
+Rationale
+=========
+
+Linux distributions such as Ubuntu [2]_ and Debian [3]_ provide more
+than one Python version at the same time to their users.  For example,
+Ubuntu 9.10 Karmic Koala can install Python 2.5, 2.6, and 3.1, with
+Python 2.6 being the default.
+
+In order to ease the burden on operating system packagers for these
+distributions, the distribution packages do not contain Python version
+numbers [4]_; they are shared across all Python versions installed on
+the system.  Putting Python version numbers in the packages would be a
+maintenance nightmare, since all the packages - *and their
+dependencies* - would have to be updated every time a new Python
+release was added or removed from the distribution.  Because of the
+sheer number of packages available, this amount of work is infeasible.
+
+For pure Python modules, sharing is possible because upstream
+maintainers typically support multiple versions of Python in a source
+compatible way.  In practice though, it is well known that pyc files
+are not compatible across Python major releases.  A reading of
+import.c [5]_ in the Python source code proves that within recent
+memory, every new CPython major release has bumped the pyc magic
+number.
+
+Even C extensions can be source compatible across multiple versions of
+Python.  Compiled extension modules are usually not compatible though,
+and PEP 384 [6]_ has been proposed to address this by defining a
+stable ABI for extension modules.
+
+Because the distributions cannot share pyc files, elaborate mechanisms
+have been developed to put the resulting pyc files in non-shared
+locations while the source code is still shared.  Examples include the
+symlink-based Debian regimes python-support [7]_ and python-central
+[8]_.  These approaches make for much more complicated, fragile,
+inscrutable, and fragmented policies for delivering Python
+applications to a wide range of users.  Arguably more users get Python
+from their operating system vendor than from upstream tarballs.  Thus,
+solving this pyc sharing problem for CPython is a high priority for
+such vendors.
+
+This PEP proposes a solution to this problem.
+
+
+Proposal
+========
+
+Python's import machinery is extended to search for byte code cache
+files in a directory co-located with the source file, but with an
+extension 'pyr'.  The pyr directory contains individual files with the
+cached byte compilation of the source code, identical to current pyc
+and pyo files.  The files inside the pyr directory retain their file
+extensions, but the base name is replaced by the hexlified [10]_ magic
+number of the Python version the byte code is compatible with.
+
+The file extension pyr was chosen because 'r' is a mnemonic for
+'repository', and there appears to be no prior uses of the extension
+[9]_.
+
+For example, a module `foo` with source code in `foo.py` and byte
+compiled with Python 2.5, Python 2.6, Python 2.6 `-O`, Python 2.6
+`-U`, and Python 3.1 would have the following file system layout::
+
+    foo.py
+    foo.pyr/
+        f2b30a0d.pyc # Python 2.5
+        f2d10a0d.pyc # Python 2.6
+        f2d10a0d.pyo # Python 2.6 -O
+        f2d20a0d.pyc # Python 2.6 -U
+        0c4f0a0d.pyc # Python 3.1
+
+
+Python behavior
+===============
+
+When Python searches for a module to import (say `foo`), it may find
+one of several situations.  As per current Python rules, the term
+"matching pyc" means that the magic number matches the current
+interpreter's magic number, and the source file is not newer than the
+`pyc` file.
+
+When Python finds a `foo.py` file for which no `foo.pyc` file or
+`foo.pyr` directory exists, Python will by default load the `foo.py`
+file and write a `foo.pyc` file next to the source file.  This is
+unchanged from current behavior.
+
+When the Python executable is given a `-R` flag, or the environment
+variable `$PYTHONPYR` is set, then Python will create a `foo.pyr`
+directory and write a `pyc` file to that directory with the hexlified
+magic number as the base name.
+
+If during import, Python finds an existing `pyc` file but no `pyr`
+directory, and the `$PYTHONPYR` environment variable is not set, then
+the `pyc` file is loaded as normal and no `pyr` directory is created.
+
+If during import, Python finds a `pyr` directory with a matching `pyc`
+file, *regardless of whether `$PYTHONPYR` is set or not*, then
+`foo.pyr/<magic>.pyc` is loaded and import completes successfully.
+Thus a matching `pyc` file inside a `pyr` directory always takes
+precedence over a sibling `pyc` file.
+
+If during import, Python finds a `pyr` directory that does not contain
+a matching `pyc` file, and no sibling `foo.pyc` file exists, Python
+will load the source file and write a sibling `foo.pyc` file, unless
+the `-R` flag is given in which case a `foo.pyr/<magic>.pyc` file will
+be written.
+
+Here is a flowchart illustrating the rules.
+
+.. image:: pep-3147-1.png
+   :scale: 75
+
+
+Effects on non-conforming Python versions
+=========================================
+
+Python implementations which don't know anything about `pyr`
+directories will ignore them.  This means that they will read and
+write `pyc` files as usual.  A conforming implementation will still
+prefer any existing `foo.pyr/<magic>.pyc` file over an existing
+sibling `pyc` file.
+
+The one possible conflicting state is where a sibling `pyc` file
+exists, but its magic number does not match.
+
+In the default case, when Python finds a `pyc` file with a
+non-matching magic number, it simply overwrites the `pyc` file with
+the new byte code and magic number.  In the absence of the `-R` flag,
+this remains unchanged.  When the `-R` flag was given, the
+non-matching sibling `pyc` file is ignored - it is neither removed nor
+overwritten - and a `foo.pyr/<magic>.pyc` file is written instead.
+
+
+Implementation strategy
+=======================
+
+This feature is targeted for Python 3.2, solving the problem for those
+and all future versions.  It may be back-ported to Python 2.7.
+Vendors are free to backport the changes to earlier distributions as
+they see fit.
+
+
+Alternatives
+============
+
+PEP 304
+-------
+
+There is some overlap between the goals of this PEP and PEP 304 [12]_,
+which has been withdrawn.  However PEP 304 would allow a user to
+create a shadow file system hierarchy in which to store `pyc` files.
+This concept of a shadow hierarchy for `pyc` files could be used to
+satisfy the aims of this PEP.  Although the PEP 304 does not indicate
+why it was withdrawn, shadow directories have a number of problems.
+The location of the shadow `pyc` files would not be easily discovered
+and would depend on the proper and consistent use of the
+`$PYTHONBYTECODE` environment variable both by the system and by end
+users.  There are also global implications, meaning that while the
+system might want to shadow `pyc` files, users might not want to, but
+the PEP defines only an all-or-nothing approach.
+
+As an example of the problem, a common (though fragile) Python idiom
+for locating data files is to do something like this::
+
+    from os import dirname, join
+    import foo.bar
+    data_file = join(dirname(foo.bar.__file__), 'my.dat')
+
+This would be problematic since `foo.bar.__file__` will give the
+location of the `pyc` file in the shadow directory, and it may not be
+possible to find the `my.dat` file relative to the source directory
+from there.
+
+On the other hand, PEP 999 keeps all byte code artifacts co-located
+with the source file.  Some adjustment will have to be made for the
+fact that the `pyc` file lives in a subdirectory.  For example, in
+current Python, when you import a module, its `__file__` attribute
+points to its `pyc` file.  A package's `__file__` points to the `pyc`
+file for its `__init__.py`.  E.g.::
+
+    >>> import foo
+    >>> foo.__file__
+    'foo.pyc'
+    # baz is a package
+    >>> import baz
+    >>> baz.__file__
+    'baz/__init__.pyc'
+
+The implementation of PEP 999 would have to ensure that the same
+directory level is returned from `__file__` as it does without the
+`pyr` directory, so that the common idiom above continues to work::
+
+    >>> import foo
+    >>> foo.__file__
+    'foo.pyr'
+    # baz is a package
+    >>> import baz
+    >>> baz.__file__
+    'baz/__init__.pyr'
+
+Note that some existing Python code only checks for `.py` and `.pyc`
+file extensions (and possibly `.pyo`).  These would have to be
+extended to also check for `.pyr` extensions.
+
+
+Fat byte compilation files
+--------------------------
+
+An earlier version of this PEP described "fat" Python byte code files.
+These files would contain the equivalent of multiple `pyc` files in a
+single `pyf` file, with a lookup table keyed off the appropriate magic
+number.  This was an extensible file format so that the first 5
+parallel Python implementations could be supported fairly efficiently,
+but with extension lookup tables available to scale `pyf` byte code
+objects as large as necessary.
+
+The fat byte compilation files were fairly complex, so the current
+simplification of using directories was suggested.
+
+
+Multiple file extensions
+------------------------
+
+The PEP author also considered an approach where multiple thin byte
+compiled files lived in the same place, but used different file
+extensions to designate the Python version.  E.g. foo.pyc25,
+foo.pyc26, foo.pyc31 etc.  This was rejected because of the clutter
+involved in writing so many different files.  The multiple extension
+approach makes it more difficult (and an ongoing task) to update any
+tools that are dependent on the file extension.
+
+
+Open questions
+==============
+
+* Are there any concurrency issues added by this PEP, above those that
+  already exist?  For example, what if two Python processes attempt to
+  write the same `<magic>.pyc` file?  Is that any different than two
+  Python processes trying to write to the same `foo.pyc` file?
+  Current thinking is that there isn't since the exclusive open
+  mechanism currently used, will still be used to open `pyc` files
+  inside a `pyr` directory.
+
+* How do the imp [13]_ and importlib [14]_ modules need to be updated
+  to conform to the `pyr` directories?
+
+* What about `py` source files that are compatible with most but not
+  all installed Python versions.  We might need a way to say "this py
+  file should be hidden from Python versions X.Y or earlier".  There
+  are three options:
+
+  - Use file system tricks to only share py files that are actually
+    sharable in all installed Python versions (e.g. different search
+    directories for Python X.Y and Python X.Z).
+  - Introduce Python syntax that is legal before __future__ imports
+    and is evaluated to determine if the py file is compatible,
+    raising an `ImportError('no module named foo')` if not.
+  - Add an optional metadata file co-located with the py file that
+    declares which Python versions it is compatible with.
+
+  How does this requirement interact with PEP 382 namespace packages [15]_?
+
+* Are there any opportunities for also sharing extension modules
+  (.so/.dll files) in a `pyr` directory?
+
+* Would a moratorium on byte code changes, similar to the language
+  moratorium described in PEP 3003 [16]_ be a better approach to
+  pursue, and would that solve the problem for vendors?  At the time
+  of this writing, PEP 3003 is silent on the issue.
+
+
+Reference implementation
+========================
+
+TBD
+
+
+References
+==========
+
+.. [1] PEP 3146
+
+.. [2] Ubuntu: <http://www.ubuntu.com>
+
+.. [3] Debian: <http://www.debian.org>
+
+.. [4] Debian Python Policy:
+   http://www.debian.org/doc/packaging-manuals/python-policy/
+
+.. [5] import.c:
+   http://svn.python.org/view/python/branches/py3k/Python/import.c?view=markup
+
+.. [6] PEP 384
+
+.. [7] python-support:
+   http://wiki.debian.org/DebianPythonFAQ#Whatispython-support.3F
+
+.. [8] python-central:
+   http://wiki.debian.org/DebianPythonFAQ#Whatispython-central.3F
+
+.. [9] http://www.filesuffix.com/?m=search&e=pyr&submit=Search
+
+.. [10] binascii.hexlify():
+   http://www.python.org/doc/current/library/binascii.html#binascii.hexlify
+
+.. [11] The marshal module:
+   http://www.python.org/doc/current/library/marshal.html
+
+.. [12] PEP 304:
+
+.. [13] imp: http://www.python.org/doc/current/library/imp.html
+
+.. [14] importlib: http://docs.python.org/3.1/library/importlib.html
+
+.. [15] PEP 382
+
+.. [16] PEP 3003
+
+
+ACKNOWLEDGMENTS
+===============
+
+Barry Warsaw's original idea was for fat Python byte code files.
+Martin von Loewis reviewed an early draft of the PEP and suggested the
+simplification to store traditional `pyc` and `pyo` files in a
+directory.  Many other people reviewed early versions of this PEP and
+provided useful feedback including:
+
+* David Malcolm
+* Josselin Mouette
+* Matthias Klose
+* Michael Hudson
+* Michael Vogt
+* Piotr Ożarowski
+* Scott Kitterman
+* Toshio Kuratomi
+
+
+Copyright
+=========
+
+This document has been placed in the public domain.
+
+
+
+..
+   Local Variables:
+   mode: indented-text
+   indent-tabs-mode: nil
+   sentence-end-double-space: t
+   fill-column: 70
+   coding: utf-8
+   End:


More information about the Python-checkins mailing list