Following system colour scheme Selected dark colour scheme Selected light colour scheme

Python Enhancement Proposals

Appendix: Rejected Ideas

Abstract

This document contains a list of the alternative ideas to the ones proposed in PEP 639 with detailed explanations why they were rejected.

Core metadata fields

Potential alternatives to the structure, content and deprecation of the core metadata fields specified in PEP 639.

Re-use the License field

Following initial discussion, earlier versions of PEP 639 proposed re-using the existing License field, which tools would attempt to parse as a SPDX license expression with a fallback to free text. Initially, this would merely cause a warning (or even pass silently), but would eventually be treated as an error by modern tooling.

This offered the potential benefit of greater backwards-compatibility, easing the community into using SPDX license expressions while taking advantage of packages that already have them (either intentionally or coincidentally), and avoided adding yet another license-related field.

However, following substantial discussion, consensus was reached that a dedicated License-Expression field was the preferred overall approach. The presence of this field is an unambiguous signal that a package intends it to be interpreted as a valid SPDX identifier, without the need for complex and potentially erroneous heuristics, and allows tools to easily and unambiguously detect invalid content.

This avoids both false positive (License values that a package author didn’t explicitly intend as an explicit SPDX identifier, but that happen to validate as one), and false negatives (expressions the author intended to be valid SPDX, but due to a typo or mistake are not), which are otherwise not clearly distinguishable from true positives and negatives, an ambiguity at odds with the goals of PEP 639.

Furthermore, it allows both the existing License field and the license classifiers to be more easily deprecated, with tools able to cleanly distinguish between packages intending to affirmatively conform to the updated specification in PEP 639 or not, and adapt their behavior (warnings, errors, etc) accordingly. Otherwise, tools would either have to allow duplicative and potentially conflicting License fields and classifiers, or warn/error on the substantial number of existing packages that have SPDX identifiers as the value for the License field, intentionally or otherwise (e.g. MIT).

Finally, it avoids changing the behavior of an existing metadata field, and avoids tools having to guess the Metadata-Version and field behavior based on its value rather than merely its presence.

While this would mean the subset of existing distributions containing License fields valid as SPDX license expressions wouldn’t automatically be recognized as such, this only requires appending a few characters to the key name in the project’s source metadata, and PEP 639 provides extensive guidance on how this can be done automatically by tooling.

Given all this, it was decided to proceed with defining a new, purpose-created field, License-Expression.

Re-Use the License field with a value prefix

As an alternative to the previous, prefixing SPDX license expressions with, e.g. spdx: was suggested to reduce the ambiguity inherent in re-using the License field. However, this effectively amounted to creating a field within a field, and doesn’t address all the downsides of keeping the License field. Namely, it still changes the behavior of an existing metadata field, requires tools to parse its value to determine how to handle its content, and makes the specification and deprecation process more complex and less clean.

Yet, it still shares a same main potential downside as just creating a new field: projects currently using valid SPDX identifiers in the License field, intentionally or not, won’t be automatically recognized, and requires about the same amount of effort to fix, namely changing a line in the project’s source metadata. Therefore, it was rejected in favor of a new field.

Don’t make License-Expression mutually exclusive

For backwards compatibility, the License field and/or the license classifiers could still be allowed together with the new License-Expression field, presumably with a warning. However, this could easily lead to inconsistent, and at the very least duplicative license metadata in no less than three different fields, which is squarely contrary to the goals of PEP 639 of making the licensing story simpler and unambiguous. Therefore, and in concert with clear community consensus otherwise, this idea was soundly rejected.

Don’t deprecate existing License field and classifiers

Several community members were initially concerned that deprecating the existing License field and classifiers would result in excessive churn for existing package authors and raise the barrier to entry for new ones, particularly everyday Python developers seeking to package and publish their personal projects without necessarily caring too much about the legal technicalities or being a “license lawyer”. Indeed, every deprecation comes with some non-zero short-term cost, and should be carefully considered relative to the overall long-term net benefit. And at the minimum, this change shouldn’t make it more difficult for the average Python developer to share their work under a license of their choice, and ideally improve the situation.

Following many rounds of proposals, discussion and refinement, the general consensus was clearly in favor of deprecating the legacy means of specifying a license, in favor of “one obvious way to do it”, to improve the currently complex and fragmented story around license documentation. Not doing so would leave three different un-deprecated ways of specifying a license for a package, two of them ambiguous, less than clear/obvious how to use, inconsistently documented and out of date. This is more complex for all tools in the ecosystem to support indefinitely (rather than simply installers supporting older packages implementing previous frozen metadata versions), resulting in a non-trivial and unbounded maintenance cost.

Furthermore, it leads to a more complex and confusing landscape for users with three similar but distinct options to choose from, particularly with older documentation, answers and articles floating around suggesting different ones. Of the three, License-Expression is the simplest and clearest to use correctly; users just paste in their desired license identifier, or select it via a tool, and they’re done; no need to learn about Trove classifiers and dig through the list to figure out which one(s) apply (and be confused by many ambiguous options), or figure out on their own what should go in the license key (anything from nothing, to the license text, to a free-form description, to the same SPDX identifier they would be entering in the license key anyway, assuming they can easily find documentation at all about it). In fact, this can be made even easier thanks to the new field. For example, GitHub’s popular ChooseALicense.com links to how to add SPDX license identifiers to the project source metadata of various languages that support them right in the sidebar of every license page; the SPDX support in this PEP enables adding Python to that list.

For current package maintainers who have specified a License or license classifiers, PEP 639 only recommends warnings and prohibits errors for all but publishing tools, which are allowed to error if their intended distribution platform(s) so requires. Once maintainers are ready to upgrade, for those already using SPDX license expressions (accidentally or not) this only requires appending a few characters to the key name in the project’s source metadata, and for those with license classifiers that map to a single unambiguous license, or another defined case (public domain, proprietary), they merely need to drop the classifier and paste in the corresponding license identifier. PEP 639 provides extensive guidance and examples, as will other resources, as well as explicit instructions for automated tooling to take care of this with no human changes needed. More complex cases where license metadata is currently specified may need a bit of human intervention, but in most cases tools will be able to provide a list of options following the mappings in PEP 639, and these are typically the projects most likely to be constrained by the limitations of the existing license metadata, and thus most benefited by the new fields in PEP 639.

Finally, for unmaintained packages, those using tools supporting older metadata versions, or those who choose not to provide license metadata, no changes are required regardless of the deprecation.

Don’t mandate validating new fields on PyPI

Previously, while PEP 639 did include normative guidelines for packaging publishing tools (such as Twine), it did not provide specific guidance for PyPI (or other package indices) as to whether and how they should validate the License-Expression or License-File fields, nor how they should handle using them in combination with the deprecated License field or license classifiers. This simplifies the specification and either defers implementation on PyPI to a later PEP, or gives discretion to PyPI to enforce the stated invariants, to minimize disruption to package authors.

However, this had been left unstated from before the License-Expression field was separate from the existing License, which would make validation much more challenging and backwards-incompatible, breaking existing packages. With that change, there was a clear consensus that the new field should be validated from the start, guaranteeing that all distributions uploaded to PyPI that declare core metadata version 2.4 or higher and have the License-Expression field will have a valid expression, such that PyPI and consumers of its packages and metadata can rely upon to follow the specification here.

The same can be extended to the new License-File field as well, to ensure that it is valid and the legally required license files are present, and thus it is lawful for PyPI, users and downstream consumers to distribute the package. (Of course, this makes no guarantee of such as it is ultimately reliant on authors to declare them, but it improves assurance of this and allows doing so in the future if the community so decides.) To be clear, this would not require that any uploaded distribution have such metadata, only that if they choose to declare it per the new specification in PEP 639, it is assured to be valid.

Source metadata license key

Alternate possibilities related to the license key in the pyproject.toml project source metadata.

Add expression and files subkeys to table

A previous working draft of PEP 639 added expression and files subkeys to the existing license table in the project source metadata, to parallel the existing file and text subkeys. While this seemed perhaps the most obvious approach at first glance, it had several serious drawbacks relative to that ultimately taken here.

Most saliently, this means two very different types of metadata are being specified under the same top-level key that require very different handling, and furthermore, unlike the previous arrangement, the subkeys were not mutually exclusive and can both be specified at once, and with some subkeys potentially being dynamic and others static, and mapping to different core metadata fields.

Furthermore, this leads to a conflict with marking the key as dynamic (assuming that is intended to specify the [project] table keys, as that PEP seems to imprecisely imply, rather than core metadata fields), as either or both would have to be treated as dynamic. Grouping both license expressions and license files under the same key forces an “all or nothing” approach, and creates ambiguity as to user intent.

There are further downsides to this as well. Both users and tools would need to keep track of which fields are mutually exclusive with which of the others, greatly increasing cognitive and code complexity, and in turn the probability of errors. Conceptually, juxtaposing so many different fields under the same key is rather jarring, and leads to a much more complex mapping between [project] keys and core metadata fields, not in keeping with PEP 621. This causes the [project] table naming and structure to diverge further from both the core metadata and native formats of the various popular packaging tools that use it. Finally, this results in the spec being significantly more complex and convoluted to understand and implement than the alternatives.

The approach PEP 639 now takes, using the reserved top-level string value of the license key, adding a new license-files key and deprecating the license table subkeys (text and file), avoids most of the issues identified above, and results in a much clearer and cleaner design overall. It allows license and license-files to be tagged dynamic independently, separates two independent types of metadata (syntactically and semantically), restores a closer to 1:1 mapping of [project] table keys to core metadata fields, and reduces nesting by a level for both. Other than adding one extra key to the file, there was no significant apparent downside to this latter approach, so it was adopted for PEP 639.

Add an expression subkey instead of a string value

Adding just an expression subkey to the license table, instead of using the reserved top-level string value, would be more explicit for readers and writers, in line with PEP 639’s goals. However, it still has the downsides listed above that are not specific to the inclusion of the files key.

Relative to a flat string value, it adds verbosity, complexity and an extra level of nesting, and requires users and tools to remember and handle the mutual exclusivity of the subkeys and remember which are deprecated and which are not, instead of cleanly deprecating the table subkeys as a whole. Furthermore, it is less clearly the “default” choice for modern use, given users tend to gravitate toward the simplest and most obvious option. Finally, it seems reasonable to follow the suggested guidance in PEP 621, given the top-level string value was specifically reserved for this purpose.

Define a new top-level license-expression key

An earlier version of PEP 639 defined a new, top-level license-expression under the [project] table, rather than using the reserved string value of the license key. This was seen as clearer and more explicit for readers and writers, in line with the goals of PEP 639.

Additionally, while differences from existing tool formats (and core metadata field names) have precedent in PEP 621, using a key with an identical name as in most/all current tools to mean something different (and map to a different core metadata field), with distinct and incompatible syntax and semantics, does not, and could cause confusion and ambiguity for readers and authors.

Also, per the project source metadata spec, this would allow separately marking the [project] keys corresponding to the License and License-Expression metadata fields as dynamic, avoiding a potential concern with back-filling the License field from the License-Expression field as PEP 639 currently allows without it as license as dynamic (which would not be possible, since they both map to the same top-level key).

However, community consensus favored using the top-level string value of the existing license key, as reserved for this purpose by PEP 621:

A practical string value for the license key has been purposefully left out to allow for a future PEP to specify support for SPDX expressions (the same logic applies to any sort of “type” field specifying what license the file or text represents).

This is shorter and simpler for users to remember and type, avoids adding a new top-level key while taking advantage of an existing one, guides users toward using a license expression as the default, and follows what was envisioned in the original PEP 621.

Additionally, this allows cleanly deprecating the table values without deprecating the key itself, and makes them inherently mutually exclusive without users having to remember and tools having to enforce it.

Finally, consistency with other tool formats and the underlying core metadata was not considered a sufficient priority to override the advantages of using the existing key, and the dynamic concerns were mostly mitigated by not specifying legacy license to license expression conversion at build time, explicitly specifying backfilling the License field when not dynamic, and the fact that both fields are mutually exclusive, so there is little practical need to distinguish which is dynamic.

Therefore, a top-level string value for license was adopted for PEP 639, as an earlier working draft had temporarily specified.

Add a type key to treat text as expression

Instead of using the reserved top-level string value of the license key in the [project] table, one could add a type subkey to the license table to control whether text (or a string value) is interpreted as free-text or a license expression. This could make backward compatibility a little more seamless, as older tools could ignore it and always treat text as license, while newer tools would know to treat it as a license expression, if type was set appropriately. Indeed, PEP 621 seems to suggest something of this sort as a possible alternative way that SPDX license expressions could be implemented.

However, all the same downsides as in the previous item apply here, including greater complexity, a more complex mapping between the project source metadata and core metadata and inconsistency between the presentation in tool config, project source metadata and core metadata, a much less clean deprecation, further bikeshedding over what to name it, and inability to mark one but not the other as dynamic, among others.

In addition, while theoretically potentially a little easier in the short term, in the long term it would mean users would always have to remember to specify the correct type to ensure their license expression is interpreted correctly, which adds work and potential for error; we could never safety change the default while being confident that users understand that what they are entering is unambiguously a license expression, with all the false positive and false negative issues as above.

Therefore, for these as well as the same reasons this approach was rejected for the core metadata in favor of a distinct License-Expression field, we similarly reject this here in favor of the reserved string value of the license key.

Must be marked dynamic to back-fill

The license key in the pyproject.toml could be required to be explicitly set to dynamic in order for the License core metadata field to be automatically back-filled from the top-level string value of the license key. This would be more explicit that the filling will be done, as strictly speaking the license key is not (and cannot be) specified in pyproject.toml, and satisfies a stricter interpretation of the letter of the previous PEP 621 specification that PEP 639 revises.

However, this doesn’t seem to be necessary, because it is simply using the static, verbatim literal value of the license key, as specified strictly in PEP 639. Therefore, any conforming tool can trivially, deterministically and unambiguously derive this using only the static data in the pyproject.toml file itself.

Furthermore, this actually adds significant ambiguity, as it means the value could get filled arbitrarily by other tools, which would in turn compromise and conflict with the value of the new License-Expression field, which is why such is explicitly prohibited by PEP 639. Therefore, not marking it as dynamic will ensure it is only handled in accordance with PEP 639’s requirements.

Finally, users explicitly being told to mark it as dynamic, or not, to control filling behavior seems to be a bit of a mis-use of the dynamic field as apparently intended, and prevents tools from adapting to best practices (fill, don’t fill, etc) as they develop and evolve over time.

Source metadata license-files key

Alternatives considered for the license-files key in the pyproject.toml [project] table, primarily related to the path/glob type handling.

Add a type subkey to license-files

Instead of defining mutually exclusive paths and globs subkeys of the license-files [project] table key, we could achieve the same effect with a files subkey for the list and a type subkey for how to interpret it. However, the latter offers no real advantage over the former, in exchange for requiring more keystrokes, verbosity and complexity, as well as less flexibility in allowing both, or another additional subkey in the future, as well as the need to bikeshed over the subkey name. Therefore, it was summarily rejected.

Only accept verbatim paths

Globs could be disallowed completely as values to the license-files key in pyproject.toml and only verbatim literal paths allowed. This would ensure that all license files are explicitly specified, all specified license files are found and included, and the source metadata is completely static in the strictest sense of the term, without tools having to inspect the rest of the project source files to determine exactly what license files will be included and what the License-File values will be. This would also modestly simplify the spec and tool implementation.

However, practicality once again beats purity here. Globs are supported and used by many existing tools for finding license files, and explicitly specifying the full path to every license file would be unnecessarily tedious for more complex projects with vendored code and dependencies. More critically, it would make it much easier to accidentally miss a required legal file, silently rendering the package illegal to distribute.

Tools can still statically and consistently determine the files to be included, based only on those glob patterns the user explicitly specified and the filenames in the package, without installing it, executing its code or even examining its files. Furthermore, tools are still explicitly allowed to warn if specified glob patterns (including full paths) don’t match any files. And, of course, sdists, wheels and others will have the full static list of files specified in their distribution metadata.

Perhaps most importantly, this would also preclude the currently specified default value, as widely used by the current most popular tools, and thus be a major break to backward compatibility, tool consistency, and safe and sane default functionality to avoid unintentional license violations. And of course, authors are welcome and encouraged to specify their license files explicitly via the paths table subkey, once they are aware of it and if it is suitable for their project and workflow.

Only accept glob patterns

Conversely, all license-files strings could be treated as glob patterns. This would slightly simplify the spec and implementation, avoid an extra level of nesting, and more closely match the configuration format of existing tools.

However, for the cost of a few characters, it ensures users are aware whether they are entering globs or verbatim paths. Furthermore, allowing license files to be specified as literal paths avoids edge cases, such as those containing glob characters (or those confusingly or even maliciously similar to them, as described in PEP 672).

Including an explicit paths value ensures that the resulting License-File metadata is correct, complete and purely static in the strictest sense of the term, with all license paths explicitly specified in the pyproject.toml file, guaranteed to be included and with an early error should any be missing. This is not practical to do, at least without serious limitations for many workflows, if we must assume the items are glob patterns rather than literal paths.

This allows tools to locate them and know the exact values of the License-File core metadata fields without having to traverse the source tree of the project and match globs, potentially allowing easier, more efficient and reliable programmatic inspection and processing.

Therefore, given the relatively small cost and the significant benefits, this approach was not adopted.

Infer whether paths or globs

It was considered whether to simply allow specifying an array of strings directly for the license-files key, rather than making it a table with explicit paths and globs. This would be somewhat simpler and avoid an extra level of nesting, and more closely match the configuration format of existing tools. However, it was ultimately rejected in favor of separate, mutually exclusive paths and globs table subkeys.

In practice, it only saves six extra characters in the pyproject.toml (license-files = [...] vs license-files.globs = [...]), but allows the user to more explicitly declare their intent, ensures they understand how the values are going to be interpreted, and serves as an unambiguous indicator for tools to parse them as globs rather than verbatim path literals.

This, in turn, allows for more appropriate, clearly specified tool behaviors for each case, many of which would be unreliable or impossible without it, to avoid common traps, provide more helpful feedback and behave more sensibly and intuitively overall. These include, with paths, guaranteeing that each and every specified file is included and immediately raising an error if one is missing, and with globs, checking glob syntax, excluding unwanted backup, temporary, or other such files (as current tools already do), and optionally warning if a glob doesn’t match any files. This also avoids edge cases (e.g. paths that contain glob characters) and reliance on heuristics to determine interpretation—the very thing PEP 639 seeks to avoid.

Also allow a flat array value

Initially, after deciding to define license-files as a table of paths and globs, thought was given to making a top-level string array under the license-files key mean one or the other (probably globs, to match most current tools). This is slightly shorter and simpler, would allow gently nudging users toward a preferred one, and allow a slightly cleaner handling of the empty case (which, at present, is treated identically for either).

However, this again only saves six characters in the best case, and there isn’t an obvious choice; whether from a perspective of preference (both had clear use cases and benefits), nor as to which one users would naturally assume.

Flat may be better than nested, but in the face of ambiguity, users may not resist the temptation to guess. Requiring users to explicitly specify one or the other ensures they are aware of how their inputs will be handled, and is more readable for others, both human and machine alike. It also makes the spec and tool implementation slightly more complicated, and it can always be added in the future, but not removed without breaking backward compatibility. And finally, for the “preferred” option, it means there is more than one obvious way to do it.

Therefore, per PEP 20, the Zen of Python, this approach is hereby rejected.

Allow both paths and globs subkeys

Allowing both paths and globs subkeys to be specified under the license-files table was considered, as it could potentially allow more flexible handling for particularly complex projects, and specify on a per-pattern rather than overall basis whether license-files entries should be treated as paths or globs.

However, given the existing proposed approach already matches or exceeds the power and capabilities of those offered in tools’ config files, there isn’t clear demand for this and few likely cases that would benefit, it adds a large amount of complexity for relatively minimal gain, in terms of the specification, in tool implementations and in pyproject.toml itself.

There would be many more edge cases to deal with, such as how to handle files matched by both lists, and it conflicts in multiple places with the current specification for how tools should behave with one or the other, such as when no files match, guarantees of all files being included and of the file paths being explicitly, statically specified, and others.

Like the previous, if there is a clear need for it, it can be always allowed in the future in a backward-compatible manner (to the extent it is possible in the first place), while the same is not true of disallowing it. Therefore, it was decided to require the two subkeys to be mutually exclusive.

Rename paths subkey to files

Initially, it was considered whether to name the paths subkey of the license-files table files instead. However, paths was ultimately chosen, as calling the table subkey files resulted in duplication between the table name (license-files) and the subkey name (files), i.e. license-files.files = ["LICENSE.txt"], made it seem like the preferred/ default subkey when it was not, and lacked the same parallelism with globs in describing the format of the string entry rather than what was being pointed to.

Must be marked dynamic to use defaults

It may seem outwardly sensible, at least with a particularly restrictive interpretation of PEP 621’s description of the dynamic list, to consider requiring the license-files key to be explicitly marked as dynamic in order for the default glob patterns to be used, or alternatively for license files to be matched and included at all.

However, this is merely declaring a static, strictly-specified default value for this particular key, required to be used exactly by all conforming tools (so long as it is not marked dynamic, negating this argument entirely), and is no less static than any other set of glob patterns the user themself may specify. Furthermore, the resulting License-File core metadata values can still be determined with only a list of files in the source, without installing or executing any of the code, or even inspecting file contents.

Moreover, even if this were not so, practicality would trump purity, as this interpretation would be strictly backwards-incompatible with the existing format, and be inconsistent with the behavior with the existing tools. Further, this would create a very serious and likely risk of a large number of projects unknowingly no longer including legally mandatory license files, making their distribution technically illegal, and is thus not a sane, much less sensible default.

Finally, aside from adding an additional line of default-required boilerplate to the file, not defining the default as dynamic allows authors to clearly and unambiguously indicate when their build/packaging tools are going to be handling the inclusion of license files themselves rather than strictly conforming to the project source metadata portions of PEP 639; to do otherwise would defeat the primary purpose of the dynamic list as a marker and escape hatch.

License file paths

Alternatives related to the paths and locations of license files in the source and built distributions.

Flatten license files in subdirectories

Previous drafts of PEP 639 were silent on the issue of handling license files in subdirectories. Currently, the Wheel and (following its example) Setuptools projects flatten all license files into the .dist-info directory without preserving the source subdirectory hierarchy.

While this is the simplest approach and matches existing ad hoc practice, this can result in name conflicts and license files clobbering others, with no obvious defined behavior for how to resolve them, and leaving the package legally un-distributable without any clear indication to users that their specified license files have not been included.

Furthermore, this leads to inconsistent relative file paths for non-root license files between the source, sdist and wheel, and prevents the paths given in the “static” [project] table metadata from being truly static, as they need to be flattened, and may potentially overwrite one another. Finally, the source directory structure often implies valuable information about what the licenses apply to, and where to find them in the source, which is lost when flattening them and far from trivial to reconstruct.

To resolve this, the PEP now proposes, as did contributors on both of the above issues, reproducing the source directory structure of the original license files inside the .dist-info directory. This would fully resolve the concerns above, with the only downside being a more nested .dist-info directory. There is still a risk of collision with edge-case custom filenames (e.g. RECORD, METADATA), but that is also the case with the previous approach, and in fact with fewer files flattened into the root, this would actually reduce the risk. Furthermore, the following proposal rooting the license files under a licenses subdirectory eliminates both collisions and the clutter problem entirely.

Resolve name conflicts differently

Rather than preserving the source directory structure for license files inside the .dist-info directory, we could specify some other mechanism for conflict resolution, such as pre- or appending the parent directory name to the license filename, traversing up the tree until the name was unique, to avoid excessively nested directories.

However, this would not address the path consistency issues, would require much more discussion, coordination and bikeshedding, and further complicate the specification and the implementations. Therefore, it was rejected in favor of the simpler and more obvious solution of just preserving the source subdirectory layout, as many stakeholders have already advocated for.

Dump directly in .dist-info

Previously, the included license files were stored directly in the top-level .dist-info directory of built wheels and installed projects. This followed existing ad hoc practice, ensured most existing wheels currently using this feature will match new ones, and kept the specification simpler, with the license files always being stored in the same location relative to the core metadata regardless of distribution type.

However, this leads to a more cluttered .dist-info directory, littered with arbitrary license files and subdirectories, as opposed to separating licenses into their own namespace (which per the Zen of Python, PEP 20, are “one honking great idea”). While currently small, there is still a risk of collision with specific custom license filenames (e.g. RECORD, METADATA) in the .dist-info directory, which would only increase if and when additional files were specified here, and would require carefully limiting the potential filenames used to avoid likely conflicts with those of license-related files. Finally, putting licenses into their own specified subdirectory would allow humans and tools to quickly, easily and correctly list, copy and manipulate all of them at once (such as in distro packaging, legal checks, etc) without having to reference each of their paths from the core metadata.

Therefore, now is a prudent time to specify an alternate approach. The simplest and most obvious solution, as suggested by several on the Wheel and Setuptools implementation issues, is to simply root the license files relative to a licenses subdirectory of .dist-info. This is simple to implement and solves all the problems noted here, without clear significant drawbacks relative to other more complex options.

It does make the specification a bit more complex and less elegant, but implementation should remain equally simple. It does mean that wheels produced with following this change will have differently-located licenses than those prior, but as this was already true for those in subdirectories, and until PEP 639 there was no way of discovering these files or accessing them programmatically, this doesn’t seem likely to pose significant problems in practice. Given this will be much harder if not impossible to change later, once the status quo is standardized, tools are relying on the current behavior and there is much greater uptake of not only simply including license files but potentially accessing them as well using the core metadata, if we’re going to change it, now would be the time (particularly since we’re already introducing an edge-case change with how license files in subdirs are handled, along with other refinements).

Therefore, the latter has been incorporated into current drafts of PEP 639.

Add new licenses category to wheel

Instead of defining a root license directory (licenses) inside the core metadata directory (.dist-info) for wheels, we could instead define a new category (and, presumably, a corresponding install scheme), similar to the others currently included under .data in the wheel archive, specifically for license files, called (e.g.) licenses. This was mentioned by the wheel creator, and would allow installing licenses somewhere more platform-appropriate and flexible than just the .dist-info directory in the site path, and potentially be conceptually cleaner than including them there.

However, at present, PEP 639 does not implement this idea, and it is deferred to a future one. It would add significant complexity and friction to PEP 639, being primarily concerned with standardizing existing practice and updating the core metadata specification. Furthermore, doing so would likely require modifying sysconfig and the install schemes specified therein, alongside Wheel, Installer and other tools, which would be a non-trivial undertaking. While potentially slightly more complex for repackagers (such as those for Linux distributions), the current proposal still ensures all license files are included, and in a single dedicated directory (which can easily be copied or relocated downstream), and thus should still greatly improve the status quo in this regard without the attendant complexity.

In addition, this approach is not fully backwards compatible (since it isn’t transparent to tools that simply extract the wheel), is a greater departure from existing practice and would lead to more inconsistent license install locations from wheels of different versions. Finally, this would mean licenses would not be installed as proximately to their associated code, there would be more variability in the license root path across platforms and between built distributions and installed projects, accessing installed licenses programmatically would be more difficult, and a suitable install location and method would need to be created, discussed and decided that would avoid name clashes.

Therefore, to keep PEP 639 in scope, the current approach was retained.

Name the subdirectory license_files

Both licenses and license_files have been suggested as potential names for the root license directory inside .dist-info of wheels and installed projects. An initial draft of the PEP specified the former due to being slightly clearer and consistent with the name of the core metadata field (License-File) and the [project] table key (license-files). However, the current version of the PEP adopts the license name, due to a general preference by the community for its shorter length, greater simplicity and the lack of a separator character (_, -, etc.).

Other ideas

Miscellaneous proposals, possibilities and discussion points that were ultimately not adopted.

Map identifiers to license files

This would require using a mapping (as two parallel lists would be too prone to alignment errors), which would add extra complexity to how license are documented and add an additional nesting level.

A mapping would be needed, as it cannot be guaranteed that all expressions (keys) have a single license file associated with them (e.g. GPL with an exception may be in a single file) and that any expression does not have more than one. (e.g. an Apache license LICENSE and its NOTICE file, for instance, are two distinct files). For most common cases, a single license expression and one or more license files would be perfectly adequate. In the rarer and more complex cases where there are many licenses involved, authors can still safety use the fields specified here, just with a slight loss of clarity by not specifying which text file(s) map to which license identifier (though this should be clear in practice given each license identifier has corresponding SPDX-registered full license text), while not forcing the more complex data model (a mapping) on the large majority of users who do not need or want it.

We could of course have a data field with multiple possible value types (it’s a string, it’s a list, it’s a mapping!) but this could be a source of confusion. This is what has been done, for instance, in npm (historically) and in Rubygems (still today), and as result tools need to test the type of the metadata field before using it in code, while users are confused about when to use a list or a string. Therefore, this approach is rejected.

Map identifiers to source files

As discussed previously, file-level notices are out of scope for PEP 639, and the existing SPDX-License-Identifier convention can already be used if this is needed without further specification here.

Don’t freeze compatibility with a specific SPDX version

PEP 639 could omit specifying a specific SPDX specification version, or one for the list of valid license identifiers, which would allow more flexible updates as the specification evolves without another PEP or equivalent.

However, serious concerns were expressed about a future SPDX update breaking compatibility with existing expressions and identifiers, leaving current packages with invalid metadata per the definition in PEP 639. Requiring compatibility with a specific version of these specifications here and a PEP or similar process to update it avoids this contingency, and follows the practice of other packaging ecosystems.

Therefore, it was decided to specify a minimum version and requires tools to be compatible with it, while still allowing updates so long as they don’t break backward compatibility. This enables tools to immediate take advantage of improvements and accept new licenses, but also remain backwards compatible with the version specified here, balancing flexibility and compatibility.

Different licenses for source and binary distributions

As an additional use case, it was asked whether it was in scope for this PEP to handle cases where the license expression for a binary distribution (wheel) is different from that for a source distribution (sdist), such as in cases of non-pure-Python packages that compile and bundle binaries under different licenses than the project itself. An example cited was PyTorch, which contains CUDA from Nvidia, which is freely distributable but not open source. NumPy and SciPy also had similar issues, as reported by the original author of PEP 639 and now resolved for those cases.

However, given the inherent complexity here and a lack of an obvious mechanism to do so, the fact that each wheel would need its own license information, lack of support on PyPI for exposing license info on a per-distribution archive basis, and the relatively niche use case, it was determined to be out of scope for PEP 639, and left to a future PEP to resolve if sufficient need and interest exists and an appropriate mechanism can be found.