|Title:||Specifying Installation Requirements for Python Projects|
|Author:||Brett Cannon <brett at python.org>, Pradyun Gedam <pradyunsg at gmail.com>, Tzu-ping Chung <uranusjr at gmail.com>|
- Backwards Compatibility
- Security Implications
- How to Teach This
- Reference Implementation
- Rejected Ideas
- File Formats Other Than TOML
- Alternative Name to pyproject-lock.d
- Supporting a Single Lock File
- Using a Flat List Instead of a Dependency Graph
- Being Concerned About Different Dependencies Per Wheel File For a Project
- Use Wheel Tags in the File Name
- Using Semantic Versioning for version
- Alternative Names for needs
- Alternative Names for needed-by
- Only Allowing a Single Code Location For a Project
- Support for Branches and Tags for Git
- Accepting PEP 650
- Open Issues
- Allow for Tool-Specific type Values
- Support Variable Expansion in the url field
- Don't Require Lock Files Be in a pyproject-lock.d directory
- Record the Date of When the Lock File was Generated
- Locking Build Dependencies
- Recording the Requires-Dist Input to the Locker's Resolver
- Providing marker and tags per Package
This PEP specifies a file format to list the Python package installation requirements for a project. The list of projects is considered exhaustive for the installation target and thus locked down, not requiring any information beyond the platform being installed for and the lock file listing the required dependencies to perform a successful installation of dependencies.
Thanks to PEP 621, projects have a way to list their direct/top-level dependencies which they need to have installed. But PEP 621 also (purposefully) omits two key details that often become important for projects:
- A listing of all indirect/transitive dependencies
- Specifying (at least) specific versions of dependencies for reproducible installations
Both needs can be important for various reasons when creating a new environment. Consider a project which is an application that is deployed somewhere (either to users as a desktop app or to a server). Without a complete listing of all dependencies and the specific versions to use, there can be a skew between developers of the same project, or developer and user, based on what versions of a project's dependencies happen to be available at the time of installation in a new environment. For instance, a dependency may have v1 as the newest version on Monday when one developer installed the dependency, while v2 comes out on Wednesday when another developer installs the same dependency. Now the two developers are working against two different versions of the same dependency, which can lead to different outcomes. This is the use-case of developing a desktop or server application where one might have a requirements.txt file which specifies exact versions of various packages.
Another important reason for reproducible installations is for security purposes. Guaranteeing that the same binary data is downloaded and installed for all installations of an app makes sure that no bad actor has somehow changed a dependency's binary data in a malicious way. A lock file can assist in this guarantee by recording the exact details of what should be installed and how to verify that those dependencies have not changed any bytes unexpectedly. This is the use-case of developing a secure application using a requirements.txt file which specifies the hash of all the packages that should be installed.
Tied into this concept of reproducibility is the speed at which an environment can be recreated. If you created a lock file as part of your local development, it can be used to speed up recreating that development environment by minimizing having to query the network or the scope of the possible resolution of dependencies. This makes recreating your local development environment faster as the amount of work required to calculate what dependencies to install has been minimized. This is the use-case of when you are working on a library or some such project where the lock file is not committed to version control and the lock file used as a local cache of installation resolution details, such as an uncommitted poetry.lock file.
The community itself has also shown a need for lock files based on the fact that multiple tools have independently created their own lock file formats:
Other programming language communities have also shown the usefulness of lock files by developing their own solution to this problem. Some of those communities include:
Below, we identify some use-cases applicable to stakeholders in the Python community and anyone who interacts with Python package installers who are the ultimate consumers of a lock file (this is not considered exhaustive and is borrowed from PEP 650).
Providers are the parties (organization, person, community, etc.) that supply a service or software tool which interacts with Python packaging. Two different types of providers are considered:
Platform providers (cloud environments, application hosting, etc.) and infrastructure service providers need to support package installers for their users to install Python dependencies. Most only support requirements.txt files and a smattering of other file formats for listing a project's dependencies. Most providers do not want to maintain support for more than one dependency specification format because of the complexity it adds to their software or service and the resources it takes to do so (e.g. not all platform providers have the staffing to support pip-tools, Poetry, Pipenv, etc.).
This PEP would allow platform providers to declare support for this PEP and thus only have to support one dependency specification format. What this would mean is developers could use whatever toolchain they preferred for development as long as they could emit a file that implemented this PEP. This then allows developers to not have to align with what their platform providers supports as long as everyone agrees to implementing this PEP.
Integrated development environments may interact with Python package installation and management. Most only support select few tools, and users are required to find work arounds to install their dependencies using other package installers. Similar to the situation with PaaS & IaaS providers, IDE providers do not want to maintain support for N different formats. Instead, tools would only need to be able to read files which implement this PEP to perform various actions (e.g. list all the dependencies of the open project, which ones are missing, install dependencies, generate the lock file, etc.).
As an example, the Python extension for VS Code has to have custom support for each installer tool people may use: pip, Poetry, Pipenv, etc. This is not only tedious by having to track multiple projects and any changes they make, but it also locks out newer tools whose popularity isn't great enough to warrant inclusion in the extension.
Developers are teams, people, or communities that code and use Python package installers and Python packages. Three different types of developers are considered:
Most PaaS and IaaS providers only support one Python package installer: requirements.txt. This dictates the installers that developers can use while working with these providers, which might not be optimal for their application or workflow.
Developers adopting this PEP would be able to use third party platforms/infrastructure without having to worry about which Python package installer they are required to use as long as the provider also supports this PEP.
Most IDEs only support pip or a few Python package installers. Consequently, developers must use workarounds or hacky methods to install their dependencies if they use an unsupported package installer.
If the IDE uses/supports this PEP it would allow for any developer to use whatever tooling they wanted to generate their lock file while the IDE can use whatever tooling it wants to performs actions with/on the lock file.
Developers want to be able to use the installer of their choice while working with other developers, but currently have to synchronize their installer choice for compatibility of dependency installation. If all preferred installers instead implemented the specified interface, it would allow for cross use of installers, allowing developers to choose an installer regardless of their collaborator’s preference.
Package upgraders and package infrastructure in CI/CD such as Dependabot , PyUP , etc. currently support a few formats. They work by parsing and editing the dependency files with relevant package information such as upgrades, downgrades, or new hashes. Similar to Platform and IDE providers, most of these providers do not want to support N different formats.
Currently, these services/bots have to implement support for each format individually. Inevitably, the most popular formats are supported first, and less popular tools are often never supported. By implementing this specification, these services/bots can support one format, allowing users to select the tool of their choice to generate the file. This will allow for more innovation in the space, as platforms and IDEs are no longer forced to prematurely select a "winner" tool which generates a lock file.
Specifying installer requirements and adopting this PEP will reduce the friction between Python package installers and people's workflows. Consequently, it will reduce the friction between Python package installers and 3rd party infrastructure/technologies such as PaaS or IDEs. Overall, it will allow for easier development, deployment and maintenance of Python projects as Python package installation becomes simpler and more interoperable.
Specifying a single file format can also increase the pace of innovation around installers and the generation of dependency graphs. By decoupling generating the dependency graph details from installation It allows for each area to grow and innovate independently. It also allows more flexibility in tool selection on either end of the dependency graph and installation ends of this process.
To begin, two key terms should be defined. A locker is a tool which produces a lock file. An installer is a tool which consumes a lock file to install the appropriate dependencies.
The expected information flow to occur if this PEP were accepted, from the specification of top-level dependencies to all necessary dependencies being installed in a fresh environment, is:
- Read top-level dependencies from pyproject.toml (PEP 621).
- Generate a lock file via a locker in pyproject-lock.d/.
- Install the appropriate dependencies based entirely on information contained in the lock file via an installer.
The file format should be machine-readable, machine-writable, and human-readable. Since the assumption is the vast majority of lock file will be generated by a locker tool, the format should be easy to write by a locker. As install tools will be consuming the lock file, the format also needs to be easily read by an installer. But the format should also be readable by a person as people will inevitably be performing audits on lock files. Having a format that does not lend itself towards being read by people would hinder that. This includes changes to a lock file being readable in a diff format for auditing changes. It also means that understanding why something is in the lock file should be comprehensible in a diff to assist in auditing changes.
The lock file format needs to be general enough to support cross-platform and cross-environment specifications of dependencies. This allows having a single lock file which can work on a myriad of platforms and environments when that makes sense. This has been shown as a necessary feature by the various tools in the Python packaging ecosystem which already have a lock file format (e.g. Pipenv , Poetry , PDM ). This can be accomplished by allowing (but not requiring) lockers to defer marker evaluation to the installer, and thus permitting the locker to include a wider range of possible dependencies that the installer has to work with.
The lock file also needs to support reproducible installations. If one wants to restrict what the lock file covers to a single platform to guarantee the exact dependencies and files which will be installed, that should be doable. This can be critical in security contexts for projects like SecureDrop .
When a computation could be performed either in the locker or installer, the preference is to perform the computation in the locker. This is because the assumption is a locker will be executed less frequently than an installer.
The installer should be able to resolve what to install based entirely on platform/environment information and what is contained within the lock file. There should be no need to use network or other file system I/O in order to resolve what to install.
The lock file should provide enough flexibility to allow lockers and installers to innovate. While the lock file specification provides a common denominator of functionality, it should not act as a ceiling for functionality.
Because of the expected size of lock files, no effort was put into making lock files human-writable.
This PEP makes no attempt to make this work in any special way for installers to use a lock file to install into a preexisting environment. The assumption is the installer is installing into a new/fresh environment.
Lock files MUST use the TOML  file format thanks to its adoption by PEP 518 for pyproject.toml. This not only prevents the need to have another file format in the Python packaging ecosystem, but it also assists in making lock files human-readable.
Lock files MUST be kept in a directory named pyproject-lock.d. Lock files MUST end with a .toml file extension. Projects may have as many lock files as they want using whatever file name stems they choose. This PEP prescribes no specific way to automatically select between multiple lock files and installers SHOULD avoid guessing which lock file is "best-fitting" (this does not preclude situations where only a single lock file with a certain name is expected to exist and will be used by default, e.g. a documentation hosting site always using a lock file named pyproject-lock.d/rftd.toml when provided).
The following are the top-level keys of the TOML file data format.
The version of the lock file being used. The key MUST be specified and it MUST be set to 1. The number MUST always be an integer and it MUST only increment in future updates to the specification. What consistitutes a version number increase is left to future PEPs or standards changes.
Tools reading a lock file whose version they don't support MUST raise an error.
A table containing data applying to the overall lock file.
The locker MAY specify an environment marker which specifies any restrictions the lock file was generated under (e.g. specific Python versions supported).
If the installer is installing for an environment which does not satisfy the specified environment marker, the installer MUST raise an error as the lock file does not support the environment.
An array of strings representing the package specifiers for the top-level/direct dependencies of the lock file as defined by the dependency specifier spec  (i.e. the root of the dependency graph for the lock file).
Lockers MUST only allow specifiers which may be satisfiable by the lock file and the dependency graph the lock file encodes. Lockers MUST normalize project names according to the simple repository API .
A table containing arrays of tables for each dependency recorded in the lock file.
Each key of the table is the name of a package which MUST be normalized according to the simple repository API . If extras are specified as part of the project to install, the extras are to be included in the key name and are to be sorted in lexicographic order.
Within the file, the tables for the projects MUST be sorted by:
- Project/key name in lexicographic order
- Package version, newest/highest to older/lowest according to the version specifiers spec 
- Extras via lexicographic order
A key containing an array of package names which depend on this package. The package names MUST match the package name as used in the package table.
The lack of a needed-by key infers that the package is a top-level package listed in metadata.needs.
An array of tables listing files that are available to satisfy the installation of the package for the specified version in the version key.
Each table has a type key which specifies how the code is stored. All other keys in the table are dependent on the value set for type. The acceptable values for type are listed below; all other possible values are reserved for future use.
Tables in the array MUST be sorted in lexicographic order of the value of type, then lexicographic order for the value of url.
When recording a table, the fields SHOULD be listed in the order the fields are listed in this specification for consistency to make diffs of a lock file easier to read.
For all types other than "wheel", an INSTALLER MAY refuse to install code to avoid arbitrary code execution during installation.
An installer MUST verify the hash of any specified file.
Supported keys in the table are:
- url: a string of location of the wheel file (use the file: protocol for the local file system)
- hash-algorithm: a string of the algorithm used to generate the hash value stored in hash-value
- hash-value: a string of the hash of the file contents
- interpreter-tag: (optional) a string of the interpreter portion of the wheel tag as specified by the platform compatibility tags  spec
- abi-tag: (optional) a string of the ABI portion of the wheel tag as specified by the platform compatibility tags  spec
- platform-tag: (optional) a string of the platform portion of the wheel tag as specified by the platform compatibility tags  spec
If the keys related to platform compatibility tags  are absent then the installer MUST infer the tags from the URL's file name. If any of the platform compatibility tags  are specified by a key in the table then a locker MUST provide all three related keys. The values of the keys may be compressed tags.
- url: a string of location of the sdist file (use the file: protocol for the local file system)
- hash-algorithm: a string of the algorithm used to generate the hash value stored in hash-value
- hash-value: a string of the hash of the file contents
- url: a string of location of the repository (use the file: protocol for the local file system)
- commit: a string of the commit of the repository which represents the version of the package
As the commit ID for a Git repository is a hash of the repository's contents, there is no hash to verify.
A source tree which can be used to build a wheel.
- url: a string of location of the source tree (use the file: protocol for the local file system)
- mime-type: (optional) a string representing the MIME type of the
- hash-algorithm: (optional for a local directory) a string of the algorithm used to generate the hash value stored in hash-value
- hash-value: (optional for a local directory) a string of the hash of the file contents
Installers MAY use the file extension, MIME type from HTTP headers, etc. to infer whether they support the storage mechanism used for the source tree. If the MIME type cannot be inferred and it is not specified via mime-type then an error MUST be raised.
If the source tree is NOT a local directory, then an installer MUST verify the hash value. Otherwise if the source tree is a local directory then the hash-algorithm and hash-value keys MUST be left out. The installer MAY warn the user of the use of a local directory due to the potential change in code since the lock file was created.
version = 1 [tool] # Tool-specific table ala PEP 518's `[tool]` table. [metadata] marker = "python_version>='3.6'" needs = ["mousebender"] [[package.attrs]] version = "21.2.0" needed-by = ["mousebender"] [[package.attrs.code]] type = "wheel" url = "https://files.pythonhosted.org/packages/20/a9/ba6f1cd1a1517ff022b35acd6a7e4246371dfab08b8e42b829b6d07913cc/attrs-21.2.0-py2.py3-none-any.whl" hash-algorithm="sha256" hash-value = "149e90d6d8ac20db7a955ad60cf0e6881a3f20d37096140088356da6c716b0b1" [[package.mousebender]] version = "2.0.0" needs = ["attrs>=19.3", "packaging>=20.3"] [[package.mousebender.code]] type = "sdist" url = "https://files.pythonhosted.org/packages/35/bc/db77f8ca1ccf85f5c3324e4f62fc74bf6f6c098da11d7c30ef6d0f43e859/mousebender-2.0.0.tar.gz" hash-algorithm = "sha256" hash-value = "c5953026378e5dcc7090596dfcbf73aa5a9786842357273b1df974ebd79bd760" [[package.mousebender.code]] type = "wheel" url = "https://files.pythonhosted.org/packages/f4/b3/f6fdbff6395e9b77b5619160180489410fb2f42f41272994353e7ecf5bdf/mousebender-2.0.0-py3-none-any.whl" hash-algorithm = "sha256" hash-value = "a6f9adfbd17bfb0e6bb5de9a27083e01dfb86ed9c3861e04143d9fd6db373f7c" [[package.packaging]] version = "20.9" needs = ["pyparsing>=2.0.2"] needed-by = ["mousebender"] [[package.packaging.code]] type = "git" url = "https://github.com/pypa/packaging.git" commit = "53fd698b1620aca027324001bf53c8ffda0c17d1" [[package.pyparsing]] version = "2.4.7" needed-by = ["packaging"] [[package.pyparsing.code]] type="wheel" url = "https://files.pythonhosted.org/packages/8a/bb/488841f56197b13700afd5658fc279a2025a39e22449b7cf29864669b15d/pyparsing-2.4.7-py2.py3-none-any.whl" hash-algorithm="sha256" hash-value="ef9d7589ef3c200abe66653d3f1ab1033c3c419ae9b9bdb1240a85b024efc88b" interpreter-tag = "py2.py3" abi-tag = "none" platform-tag = "any"
Installers MUST implement the direct URL origin of installed distributions spec  as all packages installed from a lock file inherently originate from a URL and not a search of an index by package name and version.
Installers MUST error out if they encounter something they are unable to handle (e.g. lack of environment marker support).
- Have the user specify which lock file they would like to use in pyproject-lock.d (e.g. dev, prod)
- Check if the environment supports what is specified in metadata.tags; error out if it doesn't
- Check if the environment supports what is specified in metadata.marker; error out if it doesn't
- Gather the list of package names from metadata.needs, and for
each listed package ...
- Resolve any markers to find the appropriate package to install
- Find the most appropriate code to install for the package
- Repeat the above steps for packages listed in the needs key for each package found to install
- For each project collected to install ...
- Gather the specified code for the package
- Verify hashes of code
- Install the packages (if necessary)
As there is no pre-existing specification regarding lock files, there are no explicit backwards compatibility concerns.
As for pre-existing tools that have their own lock file, some updating will be required. Most document the lock file name, but not its contents, in which case the file name of the lock file(s) is the important part. For projects which do not commit their lock file to version control, they will need to update the equivalent of their .gitignore file. For projects that do commit their lock file to version control, what file(s) get committed will need an update.
A lock file should not introduce security issues but instead help solve them. By requiring the recording of hashes of code, a lock file is able to help prevent tampering with code since the hash details were recorded. A lock file also helps prevent unexpected package updates being installed which may be malicious.
Teaching of this PEP will very much be dependent on the lockers and installers being used for day-to-day use. Conceptually, though, users could be taught that the pyproject-lock.d directory contains files which specify what should be installed for a project to work. The benefits of consistency and security should be emphasized to help users realize why they should care about lock files.
No proof-of-concept or reference implementation currently exists.
- TOML already being used for pyproject.toml
- TOML being more human-readable
- TOML leading to better diffs
the decision was made to go with TOML. There was some concern over Python's standard library lacking a TOML parser, but most packaging tools already use a TOML parser thanks to pyproject.toml so this issue did not seem to be a showstopper. Some have also argued against this concern in the past by the fact that if packaging tools abhor installing dependencies and feel they can't vendor a package then the packaging ecosystem has much bigger issues to rectify than needing to depend on a third-party TOML parser.
The name __lockfile__ was briefly considered, but the directory would not sort next to pyproject.toml in instances where files and directories were sorted together in lexicographic order. The current naming is also more obvious in terms of its relationship to pyproject.toml.
At one point the idea of not using a directory of lock files but a single lock file which contained all possible lock information was considered. But it quickly became apparent that trying to devise a data format which could encompass both a lock file format which could support multiple environments as well as strict lock outcomes for reproducible builds would become quite complex and cumbersome.
The idea of supporting a directory of lock files as well as a single lock file named pyproject-lock.toml was also considered. But any possible simplicity from skipping the directory in the case of a single lock file seemed unnecessary. Trying to define appropriate logic for what should be the pyproject-lock.toml file and what should go into pyproject-lock.d seemed unnecessarily complicated.
The first version of this PEP proposed that the lock file have no concept of a dependency graph. Instead, the lock file would list exactly what should be installed for a specific platform such that installers did not have to make any decisions about what to install, only validating that the lock file would work for the target platform.
This idea was eventually rejected due to the number of combinations of potential PEP 508 environment markers. The decision was made that trying to have lockers generate all possible combinations when a project wants to be cross-platform would be too much.
It is technically possible for a project to specify different dependencies between its various wheel files. Taking that into consideration would then require the lock file to operate not per-project but per-file. Luckily, specifying different dependencies in this way is very rare and frowned upon and so it was deemed not worth supporting.
Instead of a monotonically increasing integer, using a float was considered to attempt to convey semantic versioning. In the end, though, it was deemed more hassle than it was worth as adding a new key would likely constitute a "major" version change (only if the key was entirely optional would it be considered "minor"), and experience with the core metadata spec  suggests there's a bigger chance parsing will be relaxed and made more strict which is also a "major" change. As such, the simplicity of using an integer made sense.
Some other names for what became needs were installs and dependencies. In the end a Python beginner was asked which term they preferred and they found needs clearer. Since there wasn't any reason to disagree with that, the decision was to go with needs.
Other names that were considered were dependents, depended-by, , supports and required-by. In the end, needed-by made sense and tied into needs.
While reproducibility is serviced better by only allowing a single code location, it limits usability for situations where one wants to support multiple platforms with a single lock file (which the community has shown is desired).
Accepting PEP 650
PEP 650 was an earlier attempt at trying to tackle this problem by specifying an API for installers instead of standardizing on a lock file format (ala PEP 517). The initial response to PEP 650 could be considered mild/lukewarm. People seemed to be consistently confused over which tools should provide what functionality to implement the PEP. It also potentially incurred more overhead as it would require executing Python APIs to perform any actions involving packaging.
This PEP chose to standardize around an artifact instead of an API (ala PEP 621). This would allow for more tool integrations as it removes the need to specifically use Python to do things such as create a lock file, update it, or even install packages listed in a lock file. It also allows for easier introspection by forcing dependency graph details to be written in a human-readable format. It also allows for easier sharing of knowledge by standardizing what people need to know more (e.g. tutorials become more portable between tools when it comes to understanding the artifact they produce). It's also simply the approach other language communities have taken and seem to be happy with.
It has been suggested to allow for custom type values in the code table. They would be prefixed with x- and followed by the tool's name and then the type, i.e. x-<tool>-<type>. This would provide enough flexibility for things such as other version control systems, innovative container formats, etc. to be officially usable in a lock file.
This could include predefined variables like PROJECT_ROOT for the directory containing pyproject-lock.d so URLs to local directories and files could be relative to the project itself.
Environment variables could be supported to avoid hardcoding things such as user credentials for Git.
It has been suggested that since installers may very well allow users to specify the path to a lock file that having this PEP say that "MUST be kept in a directory named pyproject-lock.d" is pointless as it is bound to be broken. As such, the suggestion is to change "MUST" to "SHOULD".
Since the modification date is not guaranteed to match when the lock file was generated, it has been suggested to record the date as part of the file's metadata. The question, though, is how useful is this information and can lockers that care put it into their [tool] table instead of mandating it be set?
Thanks to PEP 518, source trees and sdists can specify what build tools must be installed in order to build a wheel (or sdist in the case of a source tree). It has been suggested that the lock file also record such packages so to increase how reproducible an installation can be.
There is nothing currently in this PEP, though, that prohibits a locker from recording build tools thanks to metadata.needs acting as the entry point for calculating what to install. There is also a cost in downloading all potential sdists and source trees, reading their pyproject.toml files, and then calculating their build dependencies for locking purposes for which not everyone will want to pay the cost for.
While the needs key allows for recording dependency specifiers, this PEP does not currently require the needs key to record the exact Requires-Dist metadata that was used to calculate the lock file. It has been suggested that recording the inputs would help in auditing the outcome of the lock file.
If this were to be done, it would be an key named requested which lived along side needs and would only be specified if it would differ from what is specified in needs.
Thanks to Kushal Das for making sure reproducible builds stayed a concern for this PEP.
Thanks to Andrea McInnes for settling the bikeshedding and choosing the paint colour of needs.
|||(1, 2, 3) https://packaging.python.org/specifications/dependency-specifiers/|
|||(1, 2) https://packaging.python.org/specifications/direct-url/|
|||(1, 2, 3) https://pypi.org/project/pdm/|
|||(1, 2, 3) https://pypi.org/project/pipenv/|
|||(1, 2, 3, 4, 5, 6) https://packaging.python.org/specifications/platform-compatibility-tags/|
|||(1, 2, 3, 4) https://pypi.org/project/poetry/|
|||(1, 2) https://packaging.python.org/specifications/simple-repository-api/|
|||(1, 2, 3) https://packaging.python.org/specifications/source-distribution-format/|
|||(1, 2) https://packaging.python.org/specifications/version-specifiers/|
This document is placed in the public domain or under the CC0-1.0-Universal license, whichever is more permissive.