|Title:||Metadata for Python Software Packages 2.2|
|Author:||Philippe Ombredanne <pombredanne at nexb.com>|
|Sponsor:||Paul Moore <p.f.moore at gmail.com>|
|BDFL-Delegate:||Paul Moore <p.f.moore at gmail.com>|
- Core Metadata Specification updates
- Summary of Differences From PEP 566
- Backwards Compatibility
- Security Implications
- How to Teach Users to Use License Expressions
- Reference Implementation
- Rejected ideas
- Appendix 1. License Expression example
- Appendix 2. Surveying how we document licenses today in Python
- Appendix 3. Surveying how other package formats document licenses
The primary change introduced in this PEP updates how licenses are documented in core metadata via the License field with license expression strings using SPDX license identifiers  such that license documentation is simpler and less ambiguous:
- for package authors to create,
- for package users to read and understand, and,
- for tools to process package license information mechanically.
The other changes include:
- specifying a License-File field which is already used by wheel and setuptools to include license files in built distributions.
- defining how tools can validate license expressions and report warnings to users for invalid expressions (but still accept any string as License).
This PEP's scope is limited strictly to how we document the license of a distribution:
- with an improved and structured way to document a license expression, and,
- by including license texts in a built package.
The core metadata specification updates that are part of this PEP, have been designed to have minimal impact and to be backward compatible with v2.1. These changes utilize emerging new ways to document licenses that are already in use in some tools (e.g. by adding the Licence-File field already used in wheel and setuptools) or by some package authors (e.g. storing an SPDX license expression in the existing License field).
In addition to an update to the metadata specification, this PEP contains:
- recommendations for publishing tools on how to validate the License and Classifier fields and report informational warnings when a package uses an older, non-structured style of license documentation conventions.
- informational appendixes that contain surveys of how we document licenses today in Python packages and elsewhere, and a reference Python library to parse, validate and build correct license expressions.
It is the intent of the PEP authors to work closely with tool authors to implement to recommendations for validation and warnings specified in this PEP.
This PEP is neutral regarding the choice of various licenses.
In particular, the SPDX license expression syntax proposed in this PEP provides simpler and more expressive conventions to document more accurately any kind of license that applies to a Python package, whether under an open source, free or libre software license or a proprietary license.
This PEP makes no recommendation for certain licenses and does not require the use of specific license documentation conventions. This PEP also does not impose any restrictions when uploading to PyPI.
Instead, this PEP is intended to document common practices already in use, and recommends that publishing tools should encourage users via informational warnings when they do not follow this PEP's recommendations.
This PEP is not about documenting license in code files, even though this is a surveyed topic in the appendix.
It is the intention of the authors of this PEP to consider the submission of related but separate PEPs in the future such as:
- make License and new License-File fields mandatory including stricter enforcement in tools and PyPI publishing.
- require uploads to PyPI to use only FOSS (Free and Open Source software) licenses.
Software is licensed and providing accurate licensing information to Python packages users is an important matter. Today, there are multiple places where license is documented in package metadata and there are limitations to what can be documented. This is often leading to confusion or a lack of clarity both for package authors and package users.
Several package authors have expressed difficulty and/or frustrations with the possibilities to express licensing in package metadata. This also applies to Linux and BSD* distribution packagers. This has triggered several license-related discussions and issues and in particular:
On average, Python packages tend to have more ambiguous or missing license information than other common application package formats (such as npm, Maven or Gem) as can be seen in the statistics  page of the ClearlyDefined  project that cover all packages from PyPI, Maven, npm and Rubygems. ClearlyDefined is an open source project to help improve clarity of other open source projects that is incubating at the OSI (Open Source Initiative) .
A mini survey of existing license metadata definitions in use in the Python ecosystem today and documented in several other system/distro and application package formats is provided in Appendix 2 of this PEP.
There are a few takeaways from the survey:
- Most package formats use a single License field.
- Many modern package formats use some form of license expression syntax to optionally combine more than one license identifiers together. SPDX and SPDX-like syntaxes are the most popular in use.
- SPDX license identifiers are becoming a de-facto way to reference common licenses everywhere, whether or not a license expression syntax is used.
- Several package formats support documenting both a license expression and the paths of the corresponding files that contain the license text. Most free and open source software licenses require to include their full text in a distribution.
These considerations have guided the design and recommendations of this PEP.
The reuse of the License field with license expressions will provide an intuitive and more structured way to express the license of a distribution using a well-defined syntax and well-known license identifiers.
Over time, recommending the usage of these expressions will help Python package publishers improve the clarity of their license documentation to the benefit of packages authors, consumers and redistributors.
The canonical source for the names and semantics of each of the supported metadata fields is the Core Metadata Specification  document.
The details of the updates considered to the Core Metadata Specification  document as part of this PEP are detailed here and will be added to the canonical source once this PEP is approved.
The License-File is a string that is a .dist-info relative path to a license file. The license file content MUST be UTF-8 encoded text.
Build tools SHOULD honor this field and include the corresponding license file(s) in the built package.
Text indicating the license covering the distribution. This text can be either a valid license expression as defined here or any free text.
Publishing tools SHOULD issue an informational warning if this field is empty, missing, or is not a valid license expression as defined here. Build tools MAY issue a similar warning.
A license expression is a string using the SPDX license expression syntax as documented in the SPDX specification  using either Version 2.2  or a later compatible version. SPDX is a working group at the Linux Foundation that defines a standard way to exchange package information.
When used in the License field and as a specialization of the SPDX license expression definition, a license expression can use the following license identifiers:
- any SPDX-listed license short-form identifiers that are published in the SPDX License List  using either Version 3.10 of this list or any later compatible version. Note that the SPDX working group never removes any license identifiers: instead they may only mark an identifier as "obsolete".
- the LicenseRef-Public-Domain and LicenseRef-Proprietary strings to support generic identifiers that are not available in the SPDX license list.
When processing the License field to determine if it contains a valid license expression, tools:
- MUST ignore the case of the License field
- SHOULD report an informational warning if one or more of the following applies:
- the field does not contain a license expression,
- the license expression syntax is invalid,
- the license expression syntax is valid but some license identifiers are unknown as defined here or the license identifiers have been marked as deprecated in the SPDX License List 
- SHOULD store a case-normalized version of the License field using the reference case for each SPDX license identifier and uppercase for the AND, OR and WITH keywords. And SHOULD report an informational warning if the reference case is not used.
License expression examples:
License: MIT License: BSD-3-Clause License: MIT OR GPL-2.0-or-later OR (FSFUL AND BSD-2-Clause) License: GPL-3.0-only WITH Classpath-Exception-2.0 OR BSD-3-Clause License: This software may only be obtained by sending the author a postcard, and then the user promises not to redistribute it. License: LicenseRef-Proprietary AND LicenseRef-Public-Domain
Each entry is a string giving a single classification value for the distribution. Classifiers are described in PEP 301.
Classifier: Development Status :: 4 - Beta Classifier: Environment :: Console (Text Based)
Tools SHOULD issue an informational warning if this field contains a licensing related classifier string starting with the License :: prefix and SHOULD suggest the use of a license expression in the License field instead.
If the License field is present and contains a valid license expression, publishing tools MUST NOT also provide any licensing related classifier entries .
However, for compatibility with existing publishing and installation processes, licensing-related classifier entries SHOULD continue to be accepted if the License field is absent or does not contain a valid license expression.
Publishing tools MAY infer a license expression from the provided classifier entries if they are able to do so unambiguously.
However, no new licensing related classifiers will be added, with anyone requesting them being directed to use a license expression in the License field instead. Note that the licensing related classifiers may be deprecated in a future PEP.
Publishing tools MAY infer or suggest an equivalent license expression from the provided License or Classifier information if they are able to do so unambiguously. For instance, if a package only has this license classifier:
Classifier: License :: OSI Approved :: MIT License
Then the corresponding value for a License field using a valid license expression to suggest would be:
Here are mappings guidelines for the legacy classifiers:
- Classifier License :: Other/Proprietary License becomes License: LicenseRef-Proprietary expression.
- Classifier License :: Public Domain becomes License: LicenseRef-Public-Domain expression, though tools should encourage the use of more explicit and legally portable licenses identifiers such as CC0-1.0 , the Unlicense : the meaning associated with the term "public domain" is thoroughly dependent on the specific legal jurisdiction involved and some jurisdictions have no concept of Public Domain as it exists in the USA.
- The generic and ambiguous classifiers License :: OSI Approved and License :: DFSG approved do not have an equivalent license expression.
- The generic and sometimes ambiguous classifiers License :: Free For Educational Use, License :: Free For Home Use, License :: Free for non-commercial use, License :: Freely Distributable, License :: Free To Use But Restricted, and License :: Freeware are mapped to the generic License: LicenseRef-Proprietary expression.
- Classifiers License :: GUST* have no mapping to SPDX license identifierss for now and no package uses them in PyPI as of the writing of this PEP.
The remainder of the classifiers using a License :: prefix map to a simple single license expression using the corresponding SPDX license identifiers.
When multiple license-related classifiers are used, their relation is ambiguous and it is typically not possible to determine if all the licenses apply or if there is a choice that is possible among the licenses. In this case, tools cannot infer reliably a license expression to suggest using only the legacy classifier usage.
Summary of Differences From PEP 566
- Metadata-Version is now 2.2.
- Added one new field: License-File
- Updated the documentation of two fields: License and Classifier
The reuse of the License field means that we keep backward compatibility. The specification of the License-File field is only writing down the practices of the wheel and setuptools tools and is backward compatible with their support for that field.
The "soft" validation of the License field when it does not contain a valid license expression and when the Classifier field is used with legacy license-related classifiers means that we can gently prepare users for a possible strict and incompatible validation of these fields in the future.
This PEP has no foreseen security implications: the License field is a plain string and the License-File(s) are file paths. None of them introduces any new security concern.
The simple cases are simple: a single license id is a valid license expression and a large majority of packages use a single license.
The plan to teach users of packaging tools how to use the license with a valid license expressions is to have tool issue warning messages when they detect an incorrect license expressions or when a license-related classifier is used in the Classifier field.
With a warning message that does not terminate processing, publishing tools will gently teach users on how to provide correct license expressions over time.
Tools may also help with the conversion and suggest a license expression in some cases:
- The section Mapping Legacy Classifiers to New License expressions provides tools authors with guidelines on how to suggest a license expression from legacy classifiers.
- Tools may also be able to infer and suggest how to update an existing incorrect License value and convert that to a correct license expression. For instance a tool may suggest to correct a License field from Apache2 (which is not a valid license expression as defined in this PEP) to Apache-2.0 (which is a valid license expression using an SPDX license id as defined in this PEP).
Tools will need to support parsing and validating license expressions in the License field.
The license-expression library  is a reference Python implementation for a library that handles license expressions including parsing, validating and formatting license expressions using flexible lists of license symbols (including SPDX license identifiers and any extra identifiers referenced here). It is licensed under the Apache-2.0 license and is used in a few projects such as the SPDX Python tools , the ScanCode toolkit  and the Free Software Foundation Europe (FSFE) Reuse project .
- use a new License-Expression field and deprecate the License field.
Adding a new field would introduce backward incompatible changes when the License field would be retired later and require to have a more complex validation. The use of such a field would further introduce a new concept that is not seen anywhere else in any other package metadata (e.g. a new a field only for license expression) and possibly be a source of confusion. Also, users are less likely to start using a new field than make small adjustments to their use of existing fields.
- mapping licenses used in the license expression to specific files in the license files (or vice versa).
This would require using a mapping (two parallel lists would be too prone to alignment errors) and a mapping would bring extra complication to how license are documented by adding an additional nesting level.
A mapping would be needed as you cannot guarantee that all expressions (e.g. a GPL with an exception may be in a single file) or all the license keys have a single license file and that any expression does not have more than one. (e.g. an Apache license LICENSE and its NOTICE file for instance are two distinct file). Yet in most cases, there is a simpler one license, one or more license files. In the rarer and more complex cases where there are many licenses involved you can still use the proposed conventions at the cost of a slight loss of clarity by not specifying which text file is for which license identifier, but you are not forcing the more complex data model (e.g. a mapping) on everyone that may not need it.
We could of course have data field with multiple possible value types (it’s a string, it’s a list, it’s a mapping!) but this could be a source of confusion. This is what has been done for instance in npm (historically) and in Rubygems (still today) and as result you need to test the type of the metadata field before using it in code and users are confused about when to use a list or a string.
- mapping licenses to specific source files and/or directories of source files (or vice versa).
File-level notices are not considered as part of the scope of this PEP and the existing the SPDX-License-Identifier  convention can be used and may not need further specification as a PEP.
The current version of setuptools metadata  does not use the License field. It uses instead this license-related information in setup.cfg:
license_file = LICENSE classifiers = License :: OSI Approved :: MIT License
The simplest migration to this PEP would consist of using this instead:
license = MIT license_files = LICENSE
Another possibility would be to include the licenses of the third-party packages bundled in that are vendored in the setuptools/_vendor/ and pkg_resources/_vendor directories:
appdirs==1.4.3 packaging==20.4 pyparsing==2.2.1 ordered-set==3.1.1
These are using these license expressions:
appdirs: MIT packaging: Apache-2.0 OR BSD-2-Clause pyparsing: MIT ordered-set: MIT
Therefore, a comprehensive license documentation covering both setuptools proper and its vendored packages could contain these metadata, combining all the license expressions in one expression:
license = MIT AND (Apache-2.0 OR BSD-2-Clause) license_files = LICENSE.MIT LICENSE.packaging
Here we would assume that the LICENSE.MIT file contains the text of the MIT license and the copyrights used by setuptools, appdirs, pyparsing and ordered-set, and that the LICENSE.packaging file contains the texts of the Apache and BSD license, its copyrights and its license choice notice .
There are multiple ways used or recommended to document Python package licenses today:
The core metadata documentation License field documentation is currently:
License (optional) :::::::::::::::::: Text indicating the license covering the distribution where the license is not a selection from the "License" Trove classifiers. See "Classifier" below. This field may also be used to specify a particular version of a license which is named via the ``Classifier`` field, or to indicate a variation or exception to such a license. Examples:: License: This software may only be obtained by sending the author a postcard, and then the user promises not to redistribute it. License: GPL version 3, excluding DRM provisions
Even though there are two fields, it is at times difficult to convey anything but simpler licensing. For instance some classifiers lack accuracy (GPL without a version) and when you have multiple License-related classifiers it is not clear if this is a choice or all these apply and which ones. Furthermore, the list of available license-related classifiers is often out-of-date.
The latest PyPA sampleproject recommends only to use classifiers in setup.py and does not list the license field in its example setup.py .
Beyond a license code or qualifier, license text files are documented and included in a built package either implicitly or explicitly and this is another possible source of confusion:
- In wheels  license files are automatically added to the .dist-info directory if they match one of a few common license file name patterns (such as LICENSE*, COPYING*). Alternatively a package author can specify a list of license files paths to include in the built wheel using in the license_files field in the [metadata] section of the project's setup.cfg. Previously this was a (singular) license_file file attribute that is now deprecated but is still in common use. See  for instance.
- In setuptools , a license_file attribute is used to add a single license file to a source distribution. This singular version is still honored by wheels for backward compatibility.
- Using a LICENSE.txt file is encouraged in the packaging guide  paired with a MANIFEST.in entry to ensure that the license file is included in a built source distribution (sdist).
Note: the License-File field proposed in this PEP already exists in wheel and setuptools with the same behaviour as explained above. This PEP is only recognizing and documenting the existing practice as used in wheel (with the license_file and license_files setup.cfg [metadata] entries) and in setuptools license_file setup() argument.
(Note: Documenting licenses in source code is not in the scope of this PEP)
Beside using comments and/or SPDX-License-Identifier conventions, the license is sometimes documented in Python code file using dunder variables typically named after one of the lower cased Core metadata field such as __license__ .
This convention (dunder global variables) is recognized by the built-in help() function and the standard pydoc module. The dunder variable(s) will show up in the help() DATA section for a module.
- Conda package manifest  has support for license and license_file fields as well as a license_family license grouping field.
- flit  recommends to use classifiers instead of License (as per the current metadata spec).
- pbr  uses similar data as setuptools but always stored setup.cfg.
- poetry  specifies the use of the license field in pyproject.toml with SPDX license identifiers.
Here is a survey of how things are done elsewhere.
Note: in most cases the license texts of the most common licenses are included globally once in a shared documentation directory (e.g. /usr/share/doc).
- Debian document package licenses with machine readable copyright files . This specification defines its own license expression syntax that is very similar to the SDPX syntax and use its own list of license identifiers for common licenses also closely related to SPDX identifiers.
- Fedora RPM packages  specifies how to include License Texts  and how use a License field  that must be filled with an appropriate license Short License identifier(s) from an extensive list of "Good Licenses" identifiers . Fedora also defines its own license expression syntax very similar to the SDPX syntax.
- OpenSuse RPMs packages  use SPDX license expressions with a either SPDX license identifiers and a list of extra license identifiers .
- Gentoo ebuild use a LICENSE variable . This field is specified in GLEP-0023  and in the Gentoo development manual . Gentoo also defines a license expressions syntax and a list of allowed licenses. The expression syntax is rather different from SPDX.
- FreeBSD package Makefile  provide a LICENSE and a LICENSE_FILE field with a list of custom license symbols. For non-standard licenses, FreeBSD recommend to use LICENSE=UNKNOWN and add LICENSE_NAME and LICENSE_TEXT fields, as well as sophisticated LICENSE_PERMS to qualify the license permissions and LICENSE_GROUPS to document a license grouping. The LICENSE_COMB allows to document more than one license and how they apply together, forming a custom license expression syntax. FreeBSD also recommends the use of SPDX-License-Identifier in source code files.
- Archlinux PKGBUILD  define its own license identifiers . The value 'unknown' can be used if the license is not defined.
- OpenWRT ipk packages  use the PKG_LICENSE and PKG_LICENSE_FILES variables and recommend the use of SPDX License identifiers.
- NixOS uses SPDX identifiers  and some extras license identifiers in its license field.
- GNU Guix (based on NixOS) has a single License field, uses its own license symbols list  and specifies to use one license or a list of licenses .
- Alpine Linux apk packages  recommend using SPDX identifiers in its license field.
- In Java, Maven POM  defines a licenses XML tag with a list of license items each with a name, URL, comments and "distribution" type. This is not mandatory and the content of each field is not specified.
- Rubygems gemspec  specifies either a singular license string for a list of licenses strings. The relationship between multiple licenses in a list is not specified. They recommend using SPDX license identifiers.
- CPAN Perl modules  use a single license field which is either a single string or a list of strings. The relationship between the licenses in a list is not specified. There is a list of support own license identifiers plus these generic identifiers: open_source, restricted, unrestricted, unknown.
- Rust Cargo  specifies the use of an SPDX license expression (v2.1) in the license field. It also supports an alternative expression syntax using slash-separated SPDX license identifiers. There is also a license_file field. The crates.io package registry  requires that either license or license_file fields are set when you upload a package.
- PHP Composer composer.json  uses a license field with an SPDX License id or "proprietary". The license field is either a single string that can use something which resemble the SPDX license expression syntax with "and" and "or" keywords; or this is a list of strings if there is a choice of licenses (aka. a "disjunctive" choice of license).
- NuGet packages  were using only a simple license URL and are now specifying to use an SPDX License expression and/or the path to a license file within the package. The NuGet.org repository states that they only accepts license expressions that are approved by the Open Source Initiative or the Free Software Foundation.
- Go language modules go.mod have no provision for any metadata beyond dependencies. Licensing information is left for code authors and other community package managers to document.
- Dart/Flutter spec  recommends to use a single LICENSE file that should contain all the license texts each separated by a line with 80 hyphens.
- Cocoapods podspec  license field is either a single string or a mapping with attributes of type, file and text keys. This is mandatory unless there is a LICENSE or LICENCE file provided.
- Haskell Cabal  accepts an SPDX license expression since version 2.2. The version of the SPDX license list used is a function of the cabal version. The specification also provides a mapping between pre-SPDX Legacy license Identifiers and SPDX identifiers. Cabal also specifies a license-file(s) field that lists license files that will be installed with the package.
- Erlang/Elixir mix/hex package  specifies a licenses field as a required list of license strings and recommends to use SPDX License identifiers.
- D lang dub package  defines its own list of license identifiers and its own license expression syntax and both are similar to SPDX the conventions.
- R Package DESCRIPTION  defines its own sophisticated license expression syntax and list of licenses identifiers. R has a unique way to support specifiers for license versions such as LGPL (>= 2.0, < 3) in its license expression syntax.
- SPDX-License-Identifier  is a simple convention to document the license inside a code file.
- The Free Software Foundation (FSF) promotes using SPDX license identifiers for clarity in the GPL and other versioned free software licenses  .
- The Free Software Foundation Europe (FSFE) REUSE project  promotes using SPDX-License-Identifier.
- The Linux kernel uses SPDX-License-Identifier and parts of the FSFE REUSE conventions to document its licenses .
- U-Boot spearheaded using SPDX-License-Identifier in code and now follows the Linux ways .
- The Apache Software Foundation projects use RDF DOAP  with a single license field pointing to SPDX license identifiers.
- The Eclipse Foundation promotes using SPDX-license-Identifiers 
- The ClearlyDefined project  promotes using SPDX license identifiers and expressions to improve license clarity.
- The Android Open Source Project  use MODULE_LICENSE_XXX empty tag files where XXX is a license code such as BSD, APACHE, GPL, etc. And side-by-side with this MODULE_LICENSE file there is a NOTICE file that contains license and notices texts.
This document specifies version 2.2 of the metadata format.
- Version 1.0 is specified in PEP 241.
- Version 1.1 is specified in PEP 314.
- Version 1.2 is specified in PEP 345.
- Version 2.0, while not formally accepted, was specified in PEP 426.
- Version 2.1 is specified in PEP 566.
|||(1, 2, 3) https://packaging.python.org/specifications/core-metadata|
|||(1, 2) https://clearlydefined.io|
|||(1, 2) https://pypi.org/classifiers|
|||(1, 2, 3) https://spdx.org/licenses|
|||(1, 2) https://reuse.software/|
|||(1, 2) https://spdx.org/using-spdx-license-identifier|
|||(1, 2) https://creativecommons.org/publicdomain/zero/1.0/|
This document is placed in the public domain or under the CC0-1.0-Universal license , whichever is more permissive.
- Nick Coghlan
- Kevin P. Fleming
- Pradyun Gedam
- Oleg Grenrus
- Dustin Ingram
- Chris Jerdonek
- Cyril Roelandt
- Luis Villa