PEP 639 – Improving License Clarity with Better Package Metadata | peps.python.org (original) (raw)

Author:

Philippe Ombredanne , C.A.M. Gerlach <CAM.Gerlach at Gerlach.CAM>, Karolina Surma <karolina.surma at gazeta.pl>

PEP-Delegate:

Brett Cannon

Discussions-To:

Discourse thread

Status:

Provisional

Type:

Standards Track

Topic:

Packaging

Created:

15-Aug-2019

Post-History:

15-Aug-2019,17-Dec-2021,10-May-2024

Resolution:

Discourse message


Table of Contents

Provisional Acceptance

This PEP has been provisionally accepted, with the following required conditions before the PEP is made Final:

  1. An implementation of the PEP in two build back-ends.
  2. An implementation of the PEP in PyPI.

Abstract

This PEP defines a specification how licenses are documented in the Python projects.

To achieve that, it:

This will make license declaration simpler and less ambiguous for package authors to create, end users to understand, and tools to programmatically process.

The changes will update theCore Metadata specification to version 2.4.

Goals

This PEP’s scope is limited to covering new mechanisms for documenting the license of a distribution package, specifically defining:

The changes that this PEP requires have been designed to minimize impact and maximize backward compatibility.

Non-Goals

This PEP doesn’t recommend any particular license to be chosen by any particular package author.

If projects decide not to use the new fields, no additional restrictions are imposed by this PEP when uploading to PyPI.

This PEP also is not about license documentation for individual files, though this is a surveyed topicin an appendix, nor does it intend to cover cases where thesource distribution andbinary distribution packages don’t havethe same licenses.

Motivation

Software must be licensed in order for anyone other than its creator to download, use, share and modify it. Today, there are multiple fields where licenses are documented in Core Metadata, and there are limitations to what can be expressed in each of them. This often leads to confusion both for package authors and end users, including distribution re-packagers.

This has triggered a number of license-related discussions and issues, including on outdated and ambiguous PyPI classifiers,license interoperability with other ecosystems,too many confusing license metadata options,limited support for license files in the Wheel project, andthe lack of precise license metadata.

As a result, on average, Python packages tend to have more ambiguous and missing license information than other common ecosystems. This is supported by the statistics page of theClearlyDefined project, anOpen Source Initiative effort to help improve licensing clarity of other FOSS projects, covering all packages from PyPI, Maven, npm and Rubygems.

The current license classifiers could be extended to include the full range of the SPDX identifiers while deprecating the ambiguous classifiers (such as License :: OSI Approved :: BSD License).

However, there are multiple arguments against such an approach:

Rationale

A survey was conducted to map the existing license metadata definitions in the Python ecosystem and avariety of other packaging systems, Linux distributions, language ecosystems and applications.

The takeaways from the survey have guided the recommendations of this PEP:

Therefore, this PEP introduces two new Core Metadata fields:

Furthermore, this specification builds upon existing practice in the Setuptools andWheel projects. An up-to-date version of the current draft of this PEP isimplemented in theHatch packaging tool, and an earlier draft of thelicense files portionis implemented in Setuptools.

Terminology

The keywords “MUST”, “MUST NOT”, “REQUIRED”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described in RFC 2119.

License terms

The license-related terminology draws heavily from the SPDX Project, particularly license identifier and license expression.

license classifier

A PyPI Trove classifier(as describedin the Core Metadata specification) which begins with License ::.

license expression

SPDX expression

A string with valid SPDX license expression syntaxincluding one or more SPDX license identifier(s), which describes a Project’s license(s) and how they inter-relate. Examples:GPL-3.0-or-later,MIT AND (Apache-2.0 OR BSD-2-clause)

license identifier

SPDX identifier

A valid SPDX short-form license identifier, as described in theAdd License-Expression field section of this PEP. This includes all valid SPDX identifiers and the custom LicenseRef-[idstring] strings conforming to theSPDX specification, clause 10.1. Examples:MIT,GPL-3.0-only,LicenseRef-My-Custom-License

root license directory

license directory

The directory under which license files are stored in aproject source tree, distribution archiveor installed project. Also, the root directory that their paths recorded in the License-File Core Metadata field are relative to. Defined to be the project root directoryfor a project source tree orsource distribution; and a subdirectory named licenses of the directory containing the built metadata— i.e., the .dist-info/licenses directory— for a Built Distribution or installed project.

Specification

The changes necessary to implement this PEP include:

Note that the guidance on errors and warnings is for tools’ default behavior; they MAY operate more strictly if users explicitly configure them to do so, such as by a CLI flag or a configuration option.

SPDX license expression syntax

This PEP adopts the SPDX license expression syntax as documented in the SPDX specification, either Version 2.2 or a later compatible version.

A license expression can use the following license identifiers:

Examples of valid SPDX expressions:

MIT BSD-3-Clause MIT AND (Apache-2.0 OR BSD-2-Clause) MIT OR GPL-2.0-or-later OR (FSFUL AND BSD-2-Clause) GPL-3.0-only WITH Classpath-Exception-2.0 OR BSD-3-Clause LicenseRef-Special-License OR CC0-1.0 OR Unlicense LicenseRef-Proprietary

Examples of invalid SPDX expressions:

Use-it-after-midnight Apache-2.0 OR 2-BSD-Clause LicenseRef-License with spaces LicenseRef-License_with_underscores

Core Metadata

The error and warning guidance in this section applies to build and publishing tools; end-user-facing install tools MAY be less strict than mentioned here when encountering malformed metadata that does not conform to this specification.

As it adds new fields, this PEP updates the Core Metadata version to 2.4.

Add License-Expression field

The License-Expression optional Core Metadata fieldis specified to contain a text string that is a valid SPDX license expression, as defined above.

Build and publishing tools SHOULD check that the License-Expression field contains a valid SPDX expression, including the validity of the particular license identifiers (as defined above). Tools MAY halt execution and raise an error when an invalid expression is found. If tools choose to validate the SPDX expression, they also SHOULD store a case-normalized version of the License-Expressionfield using the reference case for each SPDX license identifier and uppercase for the AND, OR and WITH keywords. Tools SHOULD report a warning and publishing tools MAY raise an error if one or more license identifiers have been marked as deprecated in the SPDX License List.

For all newly-uploaded distribution archives that include a License-Expression field, the Python Package Index (PyPI) MUST validate that they contain a valid, case-normalized license expression with valid identifiers (as defined above) and MUST reject uploads that do not. Custom license identifiers which conform to the SPDX specification are considered valid. PyPI MAY reject an upload for using a deprecated license identifier, so long as it was deprecated as of the above-mentioned SPDX License List version.

Add License-File field

License-File is an optional Core Metadata field. Each instance contains the string representation of the path of a license-related file. The path is located within the project source tree, relative to theproject root directory. It is a multi-use field that may appear zero or more times and each instance lists the path to one such file. Files specified under this field could include license text, author/attribution information, or other legal notices that need to be distributed with the package.

As specified by this PEP, its value is also that file’s path relative to the root license directoryin both installed projects and the standardized Distribution Package types.

If a License-File is listed in aSource Distribution orBuilt Distribution’s Core Metadata:

Build tools MAY and publishing tools SHOULD produce an informative warning if a built distribution’s metadata contains no License-File entries, and publishing tools MAY but build tools MUST NOT raise an error.

For all newly-uploaded distribution archives that include one or moreLicense-File fields in their Core Metadata and declare a Metadata-Version of 2.4 or higher, PyPI SHOULD validate that all specified files are present in thatdistribution archives, and MUST reject uploads that do not validate.

Deprecate License field

The legacy unstructured-text License Core Metadata fieldis deprecated and replaced by the new License-Expression field. The fields are mutually exclusive. Tools which generate Core Metadata MUST NOT create both these fields. Tools which read Core Metadata, when dealing with both these fields present at the same time, MUST read the value of License-Expression and MUST disregard the value of the License field.

If only the License field is present, tools MAY issue a warning informing users it is deprecated and recommending License-Expressioninstead.

For all newly-uploaded distribution archives that include aLicense-Expression field, the Python Package Index (PyPI) MUST reject any that specify both License and License-Expression fields.

The License field may be removed from a new version of the specification in a future PEP.

Deprecate license classifiers

Using license classifiers in the Classifier Core Metadata field(described in the Core Metadata specification) is deprecated and replaced by the more precise License-Expression field.

If the License-Expression field is present, build tools MAY raise an error if one or more license classifiers is included in a Classifier field, and MUST NOT add such classifiers themselves.

Otherwise, if this field contains a license classifier, tools MAY issue a warning informing users such classifiers are deprecated, and recommending License-Expression instead. For compatibility with existing publishing and installation processes, the presence of license classifiers SHOULD NOT raise an error unlessLicense-Expression is also provided.

New license classifiers MUST NOT be added to PyPI; users needing them SHOULD use the License-Expression field instead. License classifiers may be removed from a new version of the specification in a future PEP.

Project source metadata

This PEP specifies changes to the project’s source metadata under a [project] table in the pyproject.toml file.

Add string value to license key

license key in the [project] table is defined to contain a top-level string value. It is a valid SPDX license expression asdefined in this PEP. Its value maps to the License-Expression field in the core metadata.

Build tools SHOULD validate and perform case normalization of the expression as described in theAdd License-Expression field section, outputting an error or warning as specified.

Examples:

[project] license = "MIT"

[project] license = "MIT AND (Apache-2.0 OR BSD-2-clause)"

[project] license = "MIT OR GPL-2.0-or-later OR (FSFUL AND BSD-2-Clause)"

[project] license = "LicenseRef-Proprietary"

Add license-files key

A new license-files key is added to the [project] table for specifying paths in the project source tree relative to pyproject.toml to file(s) containing licenses and other legal notices to be distributed with the package. It corresponds to the License-File fields in the Core Metadata.

Its value is an array of strings which MUST contain valid glob patterns, as specified below:

Any characters or character sequences not covered by this specification are invalid. Projects MUST NOT use such values. Tools consuming this field SHOULD reject invalid values with an error.

Tools MUST assume that license file content is valid UTF-8 encoded text, and SHOULD validate this and raise an error if it is not.

Literal paths (e.g. LICENSE) are treated as valid globs which means they can also be defined.

Build tools:

If the license-files key is present and is set to a value of an empty array, then tools MUST NOT include any license files and MUST NOT raise an error.

Examples of valid license files declaration:

[project] license-files = ["LICEN[CS]E*", "AUTHORS*"]

[project] license-files = ["licenses/LICENSE.MIT", "licenses/LICENSE.CC0"]

[project] license-files = ["LICENSE.txt", "licenses/*"]

[project] license-files = []

Examples of invalid license files declaration:

[project] license-files = ["..\LICENSE.MIT"]

Reason: .. must not be used.\ is an invalid path delimiter, / must be used.

[project] license-files = ["LICEN{CSE*"]

Reason: “LICEN{CSE*” is not a valid glob.

Deprecate license key table subkeys

Table values for the license key in the [project] table, including the text and file table subkeys, are now deprecated. If the new license-files key is present, build tools MUST raise an error if the license key is defined and has a value other than a single top-level string.

If the new license-files key is not present and the text subkey is present in a license table, tools SHOULD issue a warning informing users it is deprecated and recommending a license expression as a top-level string key instead.

Likewise, if the new license-files key is not present and the file subkey is present in the license table, tools SHOULD issue a warning informing users it is deprecated and recommending the license-files key instead.

If the specified license file is present in the source tree, build tools SHOULD use it to fill the License-File field in the core metadata, and MUST include the specified file as if it were specified in a license-file field. If the file does not exist at the specified path, tools MUST raise an informative error as previously specified.

Table values for the license key MAY be removed from a new version of the specification in a future PEP.

License files in project formats

A few additions will be made to the existing specifications.

Project source trees

Per Project source metadata section, theDeclaring Project Metadata specificationwill be updated to reflect that license file paths MUST be relative to the project root directory; i.e. the directory containing the pyproject.toml(or equivalently, other legacy project configuration, e.g. setup.py, setup.cfg, etc).

Source distributions (sdists)

The sdist specification will be updated to reflect that if the Metadata-Version is 2.4 or greater, the sdist MUST contain any license files specified by the License-File fieldin the PKG-INFO at their respective paths relative to the of the sdist (containing the pyproject.toml and the PKG-INFO Core Metadata).

Built distributions (wheels)

The Wheel specification will be updated to reflect that if the Metadata-Version is 2.4 or greater and one or moreLicense-File fields is specified, the .dist-info directory MUST contain a licenses subdirectory, which MUST contain the files listed in the License-File fields in the METADATA file at their respective paths relative to the licenses directory.

Installed projects

The Recording Installed Projects specification will be updated to reflect that if the Metadata-Version is 2.4 or greater and one or more License-File fields is specified, the .dist-infodirectory MUST contain a licenses subdirectory which MUST contain the files listed in the License-File fields in the METADATA file at their respective paths relative to the licenses directory, and that any files in this directory MUST be copied from wheels by install tools.

Converting legacy metadata

Tools MUST NOT use the contents of the license.text [project] key (or equivalent tool-specific format), license classifiers or the value of the Core Metadata License field to fill the top-level string value of the license key or the Core Metadata License-Expression field without informing the user and requiring unambiguous, affirmative user action to select and confirm the desired license expression value before proceeding.

Tool authors, who need to automatically convert license classifiers to SPDX identifiers, can use therecommendation prepared by the PEP authors.

Backwards Compatibility

Adding a new License-Expression Core Metadata field and a top-level string value for the license key in the pyproject.toml [project] table unambiguously means support for the specification in this PEP. This avoids the risk of new tooling misinterpreting a license expression as a free-form license description or vice versa.

The legacy deprecated Core Metadata License field, license key table subkeys (text and file) in the pyproject.toml [project] table and license classifiers retain backwards compatibility. A removal is left to a future PEP and a new version of the Core Metadata specification.

Specification of the new License-File Core Metadata field and adding the files in the distribution is designed to be largely backwards-compatible with the existing use of that field in many packaging tools. The new license-files key in the [project] table ofpyproject.toml will only have an effect once users and tools adopt it.

This PEP specifies that license files should be placed in a dedicatedlicenses subdir of .dist-info directory. This is new and ensures that wheels following this PEP will have differently-located licenses relative to those produced via the previous installer-specific behavior. This is further supported by a new metadata version.

This also resolves current issues where license files are accidentally replaced if they have the same names in different places, making wheels undistributable without noticing. It also prevents conflicts with other metadata files in the same directory.

The additions will be made to the source distribution (sdist), built distribution (wheel) and installed project specifications. They document behaviors allowed under their current specifications, and gate them behind the new metadata version.

This PEP proposes PyPI implement validation of the newLicense-Expression and License-File fields, which has no effect on new and existing packages uploaded unless they explicitly opt in to using these new fields and fail to follow the specification correctly. Therefore, this does not have a backward compatibility impact, and guarantees forward compatibility by ensuring all distributions uploaded to PyPI with the new fields conform to the specification.

Security Implications

This PEP has no foreseen security implications: the License-Expressionfield is a plain string and the License-File fields are file paths. Neither introduces any known new security concerns.

How to Teach This

A majority of packages use a single license which makes the case simple: a single license identifier is a valid license expression.

Users of packaging tools will learn the valid license expression of their package through the messages issued by the tools when they detect invalid ones, or when the deprecated License field or license classifiers are used.

If an invalid License-Expression is used, the users will not be able to publish their package to PyPI and an error message will help them understand they need to use SPDX identifiers. It will be possible to generate a distribution with incorrect license metadata, but not to publish one on PyPI or any other index server that enforcesLicense-Expression validity. For authors using the now-deprecated License field or license classifiers, packaging tools may warn them and inform them of the replacement,License-Expression.

Tools may also help with the conversion and suggest a license expression in many common cases:

Reference Implementation

Tools will need to support parsing and validating license expressions in theLicense-Expression field if they decide to implement this part of the specification. It’s up to the tools whether they prefer to implement the validation on their side (e.g. like hatch) or use one of the available Python libraries (e.g. license-expression). This PEP does not mandate using any specific library and leaves it to the tools authors to choose the best implementation for their projects.

Rejected Ideas

Many alternative ideas were proposed and after a careful consideration, rejected. The exhaustive list including the rationale for rejecting can be found in a separate page.

Appendices

A list of auxiliary documents is provided:

Acknowledgments

This document is placed in the public domain or under theCC0-1.0-Universal license, whichever is more permissive.