Notice: While Javascript is not essential for this website, your interaction with the content will be limited. Please turn Javascript on for the full experience.

PEP 625 -- File name of a Source Distribution

PEP:625
Title:File name of a Source Distribution
Author:Tzu-ping Chung <uranusjr at gmail.com>, Paul Moore <p.f.moore at gmail.com>
Discussions-To:https://discuss.python.org/t/draft-pep-file-name-of-a-source-distribution/4686
Status:Draft
Type:Standards Track
Created:08-Jul-2020
Post-History:08-Jul-2020

Abstract

This PEP describes a standard naming scheme for a Source Distribution, also known as an sdist. This scheme distinguishes an sdist from an arbitrary archive file containing source code of Python packages, and can be used to communicate information about the distribution to packaging tools.

A standard sdist specified here is a gzipped tar file with a specially formatted file stem and a .sdist suffix. This PEP does not specify the contents of the tarball.

Motivation

An sdist is a Python package distribution that contains "source code" of the Python package, and requires a build step to be turned into a wheel on installation. This format is often considered as an unbuilt counterpart of a PEP 427 wheel, and given special treatments in various parts of the packaging ecosystem.

Compared to wheel, however, the sdist is entirely unspecified, and currently works by convention. The widely accepted format of an sdist is defined by the implementation of distutils and setuptools, which creates a source code archive in a predictable format and file name scheme. Installers exploit this predictability to assign this format certain contextual information that helps the installation process. pip, for example, parses the file name of an sdist from a PEP 503 index, to obtain the distribution's project name and version for dependency resolution purposes. But due to the lack of specification, the installer does not have any guarantee as to the correctness of the inferred message, and must verify it at some point by locally building the distribution metadata.

This build step is awkward for a certain class of operations, when the user does not expect the build process to occur. pypa/pip#8387 [1] describes an example. The command pip download --no-deps --no-binary=numpy numpy is expected to only download an sdist for numpy, since we do not need to check for dependencies, and both the name and version are available by introspecting the downloaded file name. pip, however, cannot assume the downloaded archive follows the convention, and must build and check the metadata. For a PEP 518 project, this means running the prepare_metadata_for_build_wheel hook specified in PEP 517, which incurs significant overhead.

Rationale

By creating a special file name scheme for the sdist format, this PEP frees up tools from the time-consuming metadata verification step when they only need the metadata available in the file name.

This PEP also serves as the formal specification to the long-standing file name convention used by the current sdist implementations. The file name contains the distribution name and version, to aid tools identifying a distribution without needing to download, unarchive the file, and perform costly metadata generation for introspection, if all the information they need is available in the file name.

Specification

The name of an sdist should be {distribution}-{version}.sdist.

  • distribution is the name of the distribution as defined in PEP 345, and normalised according to PEP 503, e.g. 'pip', 'flit-core'.
  • version is the version of the distribution as defined in PEP 440, e.g. 20.2.

Each component is escaped according to the same rules as PEP 427.

An sdist must be a gzipped tar archive that is able to be extracted by the standard library tarfile module with the open flag 'r:gz'.

Backwards Compatibility

The new file name scheme should not incur backwards incompatibility in existing tools. Installers are likely to have already implemented logic to exclude extensions they do not understand, since they already need to deal with legacy formats on PyPI such as .rpm and .egg. They should be able to correctly ignore files with extension .sdist.

pip, for example, skips this extension with the following debug message:

Skipping link: unsupported archive format: sdist: <URL to file>

While setuptools ignores it silently.

Rejected Ideas

Create specification for sdist metadata

The topic of creating a trustworthy, standard sdist metadata format as a means to distinguish sdists from arbitrary archive files has been raised and discussed multiple times, but has yet to make significant progress due to the complexity of potential metadata inconsistency between an sdist and a wheel built from it.

This PEP does not exclude the possibility of creating a metadata specification for sdists in the future. But by specifying only the file name of an sdist, a tool can reliably identify an sdist, and perform useful introspection on its identity, without going into the details required for metadata specification.

Use a currently common sdist naming scheme

There is a currently established practice to name an sdist in the format of {distribution}-{version}.[tar.gz|zip].

Popular source code management services use a similar scheme to name the downloaded source archive. GitHub, for example, uses distribution-1.0.zip as the archive name containing source code of repository distribution on branch 1.0. Giving this scheme a special meaning would cause confusion since a source archive may not a valid sdist.

Augment a currently common sdist naming scheme

A scheme {distribution}-{version}.sdist.tar.gz was raised during the initial discussion. This was abandoned due to backwards compatibility issues with currently available installation tools. pip 20.1, for example, would parse distribution-1.0.sdist.tar.gz as project distribution with version 1.0.sdist. This would cause the sdist to be downloaded, but fail to install due to inconsistent metadata.

The same problem exists for all common archive suffixes. To avoid confusing old installers, the sdist scheme must use a suffix that they do not identify as an archive.

Source: https://github.com/python/peps/blob/master/pep-0625.rst