[Python-checkins] peps: Add PEP 438: Transitioning to release-file hosting on PyPI, submitted by Holger

georg.brandl python-checkins at python.org
Fri Mar 15 22:51:32 CET 2013


http://hg.python.org/peps/rev/021c1f97ee13
changeset:   4803:021c1f97ee13
user:        Georg Brandl <georg at python.org>
date:        Fri Mar 15 22:51:25 2013 +0100
summary:
  Add PEP 438: Transitioning to release-file hosting on PyPI, submitted by Holger Krekel.

files:
  pep-0438.txt |  387 +++++++++++++++++++++++++++++++++++++++
  1 files changed, 387 insertions(+), 0 deletions(-)


diff --git a/pep-0438.txt b/pep-0438.txt
new file mode 100644
--- /dev/null
+++ b/pep-0438.txt
@@ -0,0 +1,387 @@
+PEP: 438
+Title: Transitioning to release-file hosting on PyPI
+Version: $Revision$
+Last-Modified: $Date$
+Author: Holger Krekel <holger at merlinux.eu>, Carl Meyer <carl at oddbird.net>
+Discussions-To: catalog-sig at python.org
+Status: Draft
+Type: Process
+Content-Type: text/x-rst
+Created: 15-Mar-2013
+Post-History:
+
+
+Abstract
+========
+
+This PEP proposes a backward-compatible two-phase transition process
+to speed up, simplify and robustify installing from the
+pypi.python.org (PyPI) package index.  To ease the transition and
+minimize client-side friction, **no changes to distutils or existing
+installation tools are required in order to benefit from the first
+transition phase, which will result in faster, more reliable installs
+for most existing packages**.
+
+The first transition phase implements an easy and explicit means for a
+package maintainer to control which release file links are served to
+present-day installation tools.  The first phase also includes the
+implementation of analysis tools for present-day packages, to support
+communication with package maintainers and the automated setting of
+default modes for controlling release file links.  The first phase
+also will default newly-registered projects on PyPI to only serve
+links to release files which were uploaded to PyPI.
+
+The second transition phase concerns end-user installation tools,
+which shall default to only install release files that are hosted on
+PyPI and tell the user if external release files exist, offering a
+choice to automatically use those external files.
+
+
+Rationale
+=========
+
+.. _history:
+
+History and motivations for external hosting
+--------------------------------------------
+
+When PyPI went online, it offered release registration but had no
+facility to host release files itself.  When hosting was added, no
+automated downloading tool existed yet.  When Philip Eby implemented
+automated downloading (through setuptools), he made the choice to
+allow people to use download hosts of their choice.  The finding of
+externally-hosted packages was implemented as follows:
+
+#. The PyPI ``simple/`` index for a package contains all links found
+   by scraping them from that package's long_description metadata for
+   any release. Links in the "Download-URL" and "Home-page" metadata
+   fields are given ``rel=download`` and ``rel=homepage`` attributes,
+   respectively.
+
+#. Any of these links whose target is a file whose name appears to be
+   in the form of an installable source or binary distribution, with
+   name in the form "packagename-version.ARCHIVEEXT", is considered a
+   potential installation candidate by installation tools.
+
+#. Similarly, any links suffixed with an "#egg=packagename-version"
+   fragment are considered an installation candidate.
+
+#. Additionally, the ``rel=homepage`` and ``rel=download`` links are
+   crawled by installation tools and, if HTML, are themselves scraped
+   for release-file links in the above formats.
+
+Today, most packages released on PyPI host their release files on
+PyPI, but a small percentage (XXX need updated data) rely on external
+hosting.
+
+There are many reasons [2]_ why people have chosen external
+hosting. To cite just a few:
+
+- release processes and scripts have been developed already and upload
+  to external sites
+
+- it takes too long to upload large files from some places in the
+  world
+
+- export restrictions e.g. for crypto-related software
+
+- company policies which require offering open source packages through
+  own sites
+
+- problems with integrating uploading to PyPI into one's release
+  process (because of release policies)
+
+- desiring download statistics different from those maintained by PyPI
+
+- perceived bad reliability of PyPI
+
+- not aware that PyPI offers file-hosting
+
+Irrespective of the present-day validity of these reasons, there
+clearly is a history why people choose to host files externally and it
+even was for some time the only way you could do things.  This PEP
+takes the position that there are at least some valid reasons for
+external hosting.
+
+Problem
+-------
+
+**Today, python package installers (pip, easy_install, buildout, and
+others) often need to query many non-PyPI URLs even if there are no
+externally hosted files**.  Apart from querying pypi.python.org's
+simple index pages, also all homepages and download pages ever
+specified with any release of a package are crawled by an installer.
+The need for installers to crawl external sites slows down
+installation and makes for a brittle and unreliable installation
+process.  Those sites and packages also don't take part in the
+:pep:`381` mirroring infrastructure, further decreasing reliability
+and speed of automated installation processes around the world.
+
+Most packages are hosted directly on pypi.python.org [1]_.  Even for
+these packages, installers still crawl their homepage and
+download-url, if specified.  Many package uploaders are not aware that
+specifying the "homepage" or "download-url" in their package metadata
+will needlessly slow down the installation process for all users.
+
+Relying on third party sites also opens up more attack vectors for
+injecting malicious packages into sites using automated installs.  A
+simple attack might just involve getting hold of an old now-unused
+homepage domain and placing malicious packages there.  Moreover,
+performing a Man-in-The-Middle (MITM) attack between an installation
+site and any of the download sites can inject malicious packages on
+the installation site.  As many homepages and download locations are
+using HTTP and not HTTPS, such attacks are not hard to launch.  Such
+MITM attacks can easily happen even for packages which never intended
+to host files externally as their homepages are contacted by
+installers anyway.
+
+There is currently no way for package maintainers to avoid
+external-link crawling, other than removing all homepage/download url
+metadata for all historic releases.  While a script [3]_ has been
+written to perform this action, it is not a good general solution
+because it removes useful metadata from PyPI releases.
+
+Even if the sites referenced by "Homepage" and "Download-URL" links
+were not scraped for further links, there is no obvious way under the
+current system for a package owner to link to an installable file from
+a long_description metadata field (which is shown as package
+documentation on ``/pypi/PKG``) without installation tools
+automatically considering that file a candidate for installation.
+Conversely, there is no way to explicitly register multiple external
+release files without putting them in metadata fields.
+
+
+Goals
+-----
+
+These are the goals to be achieved by implementation of this PEP:
+
+* Package owners should be able to explicitly control which files are
+  presented by PyPI to installer tools as installation
+  candidates. Installation should not be slowed and made less reliable
+  by extensive and unnecessary crawling of links that package owners
+  did not explicitly nominate as installation files.
+
+* It should remain possible for package owners to choose to host their
+  release files on their own hosting, external to PyPI. It should be
+  easy for a user to request the installation of such releases using
+  automated installer tools.
+
+* Automated installer tools should not install externally-hosted
+  packages **by default**, but only when explicitly authorized to do
+  so by the user. When tools refuse to install such a package by
+  default, they should tell the user exactly which external link(s)
+  they would need to follow, and what option(s) the user can provide
+  to authorize the tool to follow those links. PyPI should provide all
+  necessary metadata for installer tools to implement this easily and
+  within a single request/reply interaction.
+
+* Migration from the status quo to the above points should be gradual
+  and minimize breakage. This includes tooling that makes it easy for
+  package owners with an existing release process that uploads to
+  non-PyPI hosting to also upload those release files to PyPI.
+
+
+Solution / two transition phases
+================================
+
+The first transition phase introduces a "hosting-mode" field for each
+project on PyPI, allowing package owners explicit control of which
+release file links are served to present-day installation tools in the
+machine-readable ``simple/`` index. The first transition will, after
+successful hosting-mode manipulations by individual early-adopters,
+set a default hosting mode for existing packages, based on automated
+analysis.  **Maintainers will be notified one month ahead of any such
+automated change**.  At completion of the first transition phase,
+**all present-day existing release and installation processes and
+tools are expected to continue working**.  Any remaining errors or
+problems are expected to only relate to installation of individual
+packages and can be easily corrected by package maintainers or PyPI
+admins if maintainers are not reachable.
+
+Also in the first phase, each link served in the ``simple/`` index
+will be explicitly marked as ``rel="internal"`` (hosted by the index
+itself) or ``rel="external"`` (linking to an external site that is not
+part of the index).
+
+In the second transition phase, PyPI client installation tools shall
+be updated to default to only install ``rel="internal"`` packages
+unless a user specifies option(s) to permit installing from external
+links.
+
+Maintainers of packages which currently host release files on non-PyPI
+sites shall receive instructions and tools to ease "re-hosting" of
+their historic and future package release files.  This re-hosting tool
+MUST be available before automated hosting-mode changes are announced
+to package maintainers.
+
+
+Implementation
+==============
+
+Hosting modes
+-------------
+
+The foundation of the first transition phase is the introduction of
+three "modes" of PyPI hosting for a package, affecting which links are
+generated for the ``simple/`` index.  These modes are implemented
+without requiring changes to installation tools via changes to the
+algorithm for generating the machine-readable ``simple/`` index.
+
+The modes are:
+
+- ``pypi-scrape-crawl``: no change from the current situation of
+  generating machine-readable links for installation tools, as
+  outlined in the history_.
+
+- ``pypi-scrape``: for a package in this mode, links to be added to
+  the ``simple/`` index are still scraped from package
+  metadata. However, the "Home-page" and "Download-url" links are
+  given ``rel=ext-homepage`` and ``rel=ext-download`` attributes
+  instead of ``rel=homepage`` and ``rel=download``. The effect of this
+  (with no change in installation tools necessary) is that these links
+  will not be followed and scraped for further candidate links by
+  present-day installation tools: only installable files directly
+  hosted from PyPI or linked directly from PyPI metadata will be
+  considered for installation.  Installation tools MAY evolve to offer
+  an option to use the new rel-attribution to crawl external pages but
+  MUST NOT default to it.
+
+- ``pypi-explicit``: for a package in this mode, only links to release
+  files uploaded to PyPI, and external links to release files
+  explicitly nominated by the package owner (via a new interface
+  exposed by PyPI) will be added to the ``simple/`` index.
+
+Thus the hope is that eventually all projects on PyPI can be migrated
+to the ``pypi-explicit`` mode, while preserving the ability to install
+release files hosted externally via installer tools. Deprecation of
+hosting modes to eventually only allow the ``pypi-explicit`` mode is
+NOT REGULATED by this PEP but is expected to become feasible some time
+after successful implementation of the transition phases described in
+this PEP.  It is expected that deprecation requires **a new process to
+deal with abandoned packages** because of unreachable maintainers for
+still popular packages.
+
+
+First transition phase (PyPI)
+-----------------------------
+
+The proposed solution consists of multiple implementation and
+communication steps:
+
+#. Implement in PyPI the three modes described above, with an
+   interface for package owners to select the mode for each package
+   and register explicit external file URLs.
+
+#. For packages in all modes, label all links in the ``simple/`` index
+   with ``rel="internal"`` or ``rel="external"``, to make it easier
+   for client tools to distinguish the types of links in the second
+   transition phase.
+
+#. Default all newly-registered packages to ``pypi-explicit`` mode
+   (package owners can still switch to the other modes as desired).
+
+#. Determine (via an automated analysis tool) which packages have all
+   installable files available on PyPI itself (group A), which have
+   all installable files linked directly from PyPI metadata (group B),
+   and which have installable versions available that are linked only
+   from external homepage/download HTML pages (group C).
+
+#. Send mail to maintainers of projects in group A that their project
+   will be automatically configured to ``pypi-explicit`` mode in one
+   month, and similarly to maintainers of projects in group B that
+   their project will be automatically configured to ``pypi-scrape``
+   mode.  Inform them that this change is not expected to affect
+   installability of their project at all, but will result in faster
+   and safer installs for their users.  Encourage them to set this
+   mode themselves sooner to benefit their users.
+
+#. Send mail to maintainers of packages in group C that their package
+   hosting mode is ``pypi-scrape-crawl``, list the URLs which
+   currently are crawled, and suggest that they either re-host their
+   packages directly on PyPI and switch to ``pypi-explicit``, or at
+   least provide direct links to release files in PyPI metadata and
+   switch to ``pypi-scrape``.  Provide instructions and tools to help
+   with these transitions.
+
+
+Second transition phase (installer tools)
+-----------------------------------------
+
+For the second transition phase, maintainers of installation tools are
+asked to release two updates.
+
+The first update shall provide clear warnings if externally-hosted
+release files (that is, files whose link is ``rel="external"``) are
+selected for download, for which projects and URLs exactly this
+happens, and warn that in future versions externally-hosted downloads
+will be disabled by default.
+
+The second update should change the default mode to allow only
+installation of ``rel="internal"`` package files, and allow
+installation of externally-hosted packages only when the user supplies
+an option (ideally an option specifying exactly which external domains
+are to be trusted as download sources). When download of an
+externally-hosted package is disallowed, the user should be notified,
+with instructions for how to make the install succeed and warnings
+about the implication (that a file will be downloaded from a site that
+is not part of the package index).
+
+
+Open Questions / tasks
+======================
+
+- Should we introduce some form of PyPI API versioning in this PEP?
+  (it might complicate matters and delay the implementation but is
+  often seen as good practise).
+
+- Do another round of discussions with installation tool authors and
+  see about incorporating their feedback. There is one known issue in
+  particular from Philip J. Eby who considers a host-based pattern
+  matching algorithm preferable to interpreting "rel" attributes.
+
+
+References
+==========
+
+.. [1] Donald Stufft, ratio of externally hosted versus pypi-hosted,
+       http://mail.python.org/pipermail/catalog-sig/2013-March/005549.html
+       (XXX need to update this data for all easy_install-supported formats)
+
+.. [2] Marc-Andre Lemburg, reasons for external hosting,
+       http://mail.python.org/pipermail/catalog-sig/2013-March/005626.html
+
+.. [3] Holger Krekel, script to remove homepage/download metadata for
+       all releases
+       http://mail.python.org/pipermail/catalog-sig/2013-February/005423.html
+
+
+Acknowledgments
+===============
+
+Philip Eby for precise information and the basic ideas to implement
+the transition via server-side changes only.
+
+Donald Stufft for pushing away from external hosting and offering to
+implement both a Pull Request for the necessary PyPI changes and the
+analysis tool to drive the transition phase 1.
+
+Marc-Andre Lemburg, Nick Coghlan and catalog-sig in general for
+thinking through issues regarding getting rid of "external hosting".
+
+
+Copyright
+=========
+
+This document has been placed in the public domain.
+
+
+
+..
+   Local Variables:
+   mode: indented-text
+   indent-tabs-mode: nil
+   sentence-end-double-space: t
+   fill-column: 70
+   coding: utf-8
+   End:

-- 
Repository URL: http://hg.python.org/peps


More information about the Python-checkins mailing list