[Import-SIG] PEP for the removal of PYO files

Guido van Rossum guido at python.org
Fri Feb 27 19:01:53 CET 2015


I'm in a good mood today and I think this is a great idea! That's not to
say that I'm accepting it as-is (I haven't read it fully) but I expect that
there are very few downsides and it won't break much. (There's of course
always going to be someone who always uses -O and somehow depends on the
existence of .pyo files, but they should have seen it coming with
__pycache__ and the new version-specific extensions. :-)

On Fri, Feb 27, 2015 at 9:06 AM, Brett Cannon <bcannon at gmail.com> wrote:

> Here is my proposed PEP to drop .pyo files from Python. Thanks to Barry's
> work in PEP 3147 this really shouldn't have much impact on user's code
> (then again, bytecode files are basically an implementation detail so it
> shouldn't impact hardly anyone directly).
>
> One thing I would appreciate is if people have more motivation for this.
> While the maintainer of importlib in me wants to see this happen, the core
> developer in me thinks the arguments are a little weak. So if people can
> provide more reasons why this is a good thing that would be appreciated.
>
>
> PEP: 487
> Title: Elimination of PYO files
> Version: $Revision$
> Last-Modified: $Date$
> Author: Brett Cannon <brett at python.org>
> Status: Draft
> Type: Standards Track
> Content-Type: text/x-rst
> Created: 20-Feb-2015
> Post-History:
>
> Abstract
> ========
>
> This PEP proposes eliminating the concept of PYO files from Python.
> To continue the support of the separation of bytecode files based on
> their optimization level, this PEP proposes extending the PYC file
> name to include the optimization level in bytecode repository
> directory (i.e., the ``__pycache__`` directory).
>
>
> Rationale
> =========
>
> As of today, bytecode files come in two flavours: PYC and PYO. A PYC
> file is the bytecode file generated and read from when no
> optimization level is specified at interpreter startup (i.e., ``-O``
> is not specified). A PYO file represents the bytecode file that is
> read/written when **any** optimization level is specified (i.e., when
> ``-O`` is specified, including ``-OO``). This means that while PYC
> files clearly delineate the optimization level used when they were
> generated -- namely no optimizations beyond the peepholer -- the same
> is not true for PYO files. Put in terms of optimization levels and
> the file extension:
>
>   - 0: ``.pyc``
>   - 1 (``-O``): ``.pyo``
>   - 2 (``-OO``): ``.pyo``
>
> The reuse of the ``.pyo`` file extension for both level 1 and 2
> optimizations means that there is no clear way to tell what
> optimization level was used to generate the bytecode file. In terms
> of reading PYO files, this can lead to an interpreter using a mixture
> of optimization levels with its code if the user was not careful to
> make sure all PYO files were generated using the same optimization
> level (typically done by blindly deleting all PYO files and then
> using the `compileall` module to compile all-new PYO files [1]_).
> This issue is only compounded when people optimize Python code beyond
> what the interpreter natively supports, e.g., using the astoptimizer
> project [2]_.
>
> In terms of writing PYO files, the need to delete all PYO files
> every time one either changes the optimization level they want to use
> or are unsure of what optimization was used the last time PYO files
> were generated leads to unnecessary file churn.
>
> As for distributing bytecode-only modules, having to distribute both
> ``.pyc`` and ``.pyo`` files is unnecessary for the common use-case
> of code obfuscation and smaller file deployments.
>
>
> Proposal
> ========
>
> To eliminate the ambiguity that PYO files present, this PEP proposes
> eliminating the concept of PYO files and their accompanying ``.pyo``
> file extension. To allow for the optimization level to be unambiguous
> as well as to avoid having to regenerate optimized bytecode files
> needlessly in the `__pycache__` directory, the optimization level
> used to generate a PYC file will be incorporated into the bytecode
> file name. Currently bytecode file names are created by
> ``importlib.util.cache_from_source()``, approximately using the
> following expression defined by PEP 3147 [3]_, [4]_, [5]_::
>
>     '{name}.{cache_tag}.pyc'.format(name=module_name,
>                                     cache_tag=sys.implementation.cache_tag)
>
> This PEP proposes to change the expression to::
>
>     '{name}.{cache_tag}.opt-{optimization}.pyc'.format(
>             name=module_name,
>             cache_tag=sys.implementation.cache_tag,
>             optimization=str(sys.flags.optimize))
>
> The "opt-" prefix was chosen so as to provide a visual separator
> from the cache tag. The placement of the optimization level after
> the cache tag was chosen to preserve lexicographic sort order of
> bytecode file names based on module name and cache tag which will
> not vary for a single interpreter. The "opt-" prefix was chosen over
> "o" so as to be somewhat self-documenting. The "opt-" prefix was
> chosen over "O" so as to not have any confusion with "0" while being
> so close to the interpreter version number.
>
> A period was chosen over a hyphen as a separator so as to distinguish
> clearly that the optimization level is not part of the interpreter
> version as specified by the cache tag. It also lends to the use of
> the period in the file name to delineate semantically different
> concepts.
>
> For example, the bytecode file name of ``importlib.cpython-35.pyc``
> would become ``importlib.cpython-35.opt-0.pyc``. If ``-OO`` had been
> passed to the interpreter then instead of
> ``importlib.cpython-35.pyo`` the file name would be
> ``importlib.cpython-35.opt-2.pyc``.
>
>
> Implementation
> ==============
>
> importlib
> ---------
>
> As ``importlib.util.cache_from_source()`` is the API that exposes
> bytecode file paths as while as being directly used by importlib, it
> requires the most critical change. As of Python 3.4, the function's
> signature is::
>
>   importlib.util.cache_from_source(path, debug_override=None)
>
> This PEP proposes changing the signature in Python 3.5 to::
>
>   importlib.util.cache_from_source(path, debug_override=None, *,
> optimization=None)
>
> The introduced ``optimization`` keyword-only parameter will control
> what optimization level is specified in the file name. If the
> argument is ``None`` then the current optimization level of the
> interpreter will be assumed. Any argument given for ``optimization``
> will be passed to ``str()`` and must have ``str.isalnum()`` be true,
> else ``ValueError`` will be raised (this prevents invalid characters
> being used in the file name). It is expected that beyond Python's own
> 0-2 optimization levels, third-party code will use a hash of
> optimization names to specify the optimization level, e.g.
> ``hashlib.sha256(','.join(['dead code elimination', 'constant
> folding'])).hexdigest()``.
>
> The ``debug_override`` parameter will be deprecated. As the parameter
> expects a boolean, the integer value of the boolean will be used as
> if it had been provided as the argument to ``optimization`` (a
> ``None`` argument will mean the same as for ``optimization``). A
> deprecation warning will be raised when ``debug_override`` is given a
> value other than ``None``, but there are no plans for the complete
> removal of the parameter as this time (but removal will be no later
> than Python 4).
>
> The various module attributes for importlib.machinery which relate to
> bytecode file suffixes will be updated [7]_. The
> ``DEBUG_BYTECODE_SUFFIXES`` and ``OPTIMIZED_BYTECODE_SUFFIXES`` will
> both be documented as deprecated and set to the same value as
> ``BYTECODE_SUFFIXES`` (removal of ``DEBUG_BYTECODE_SUFFIXES`` and
> ``OPTIMIZED_BYTECODE_SUFFIXES`` is not currently planned, but will be
> not later than Python 4).
>
> All various finders and loaders will also be updated as necessary,
> but updating the previous mentioned parts of importlib should be all
> that is required.
>
>
> Rest of the standard library
> ----------------------------
>
> The various functions exposed by the ``py_compile`` and
> ``compileall`` functions will be updated as necessary to make sure
> they follow the new bytecode file name semantics [6]_, [1]_.
>
>
> Compatibility Considerations
> ============================
>
> Any code directly manipulating bytecode files from Python 3.2 on
> will need to consider the impact of this change on their code (prior
> to Python 3.2 -- including all of Python 2 -- there was no
> __pycache__ which already necessitates bifurcating bytecode file
> handling support). If code was setting the ``debug_override``
> argument to ``importlib.util.cache_from_source()`` then care will be
> needed if they want the path to a bytecode file with an optimization
> level of 2. Otherwise only code **not** using
> ``importlib.util.cache_from_source()`` will need updating.
>
> As for people who distribute bytecode-only modules, they will have
> to choose which optimization level they want their bytecode files to
> be since distributing a ``.pyo`` file with a ``.pyc`` file will no
> longer be of any use. Since people typically only distribute bytecode
> files for code obfuscation purposes or smaller distribution size
> then only having to distribute a single ``.pyc`` should actually be
> beneficial to these use-cases.
>
>
> Rejected Ideas
> ==============
>
> N/A
>
>
> Open Issues
> ===========
>
> Formatting of the optimization level in the file name
> -----------------------------------------------------
>
> Using the "opt-" prefix and placing the optimization level between
> the cache tag and file extension is not critical. Other options which
> were considered are:
>
> * ``importlib.cpython-35.o0.pyc``
> * ``importlib.cpython-35.O0.pyc``
> * ``importlib.cpython-35.0.pyc``
> * ``importlib.cpython-35-O0.pyc``
> * ``importlib.O0.cpython-35.pyc``
> * ``importlib.o0.cpython-35.pyc``
> * ``importlib.0.cpython-35.pyc``
>
> These were initially rejected either because they would change the
> sort order of bytecode files, possible ambiguity with the cache tag,
> or were not self-documenting enough.
>
>
> References
> ==========
>
> .. [1] The compileall module
>    (https://docs.python.org/3/library/compileall.html#module-compileall)
>
> .. [2] The astoptimizer project
>    (https://pypi.python.org/pypi/astoptimizer)
>
> .. [3] ``importlib.util.cache_from_source()``
>    (
> https://docs.python.org/3.5/library/importlib.html#importlib.util.cache_from_source
> )
>
> .. [4] Implementation of ``importlib.util.cache_from_source()`` from
> CPython 3.4.3rc1
>    (
> https://hg.python.org/cpython/file/038297948389/Lib/importlib/_bootstrap.py#l437
> )
>
> .. [5] PEP 3147, PYC Repository Directories, Warsaw
>    (http://www.python.org/dev/peps/pep-3147)
>
> .. [6] The py_compile module
>    (https://docs.python.org/3/library/compileall.html#module-compileall)
>
> .. [7] The importlib.machinery module
>    (
> https://docs.python.org/3/library/importlib.html#module-importlib.machinery
> )
>
>
> Copyright
> =========
>
> This document has been placed in the public domain.
>
>
> ..
>    Local Variables:
>    mode: indented-text
>    indent-tabs-mode: nil
>    sentence-end-double-space: t
>    fill-column: 70
>    coding: utf-8
>    End:
>
>
> _______________________________________________
> Import-SIG mailing list
> Import-SIG at python.org
> https://mail.python.org/mailman/listinfo/import-sig
>
>


-- 
--Guido van Rossum (python.org/~guido)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/import-sig/attachments/20150227/51a1d931/attachment-0001.html>


More information about the Import-SIG mailing list