(original) (raw)
Recent discussions have been about type hints which are orthogonal to the PEP, so things have seemed to have reached a steady state.
Was there anything else that needed clarification, Guido, or are you ready to pronounce? Or did you want to wait until the language summit? Or did you want to assign a BDFL delegate?
On Fri, 13 May 2016 at 11:37 Brett Cannon <brett@python.org> wrote:
Biggest changes since the second draft:
- Resolve \_\_fspath\_\_() from the type, not the instance (for Guido)
- Updated the TypeError messages to say "os.PathLike object" instead of "path object" (implicitly for Steven)
- TODO item to define "path-like" in the glossary (for Steven)
- Various more things added to Rejected Ideas
- Added Koos as a co-author (for Koos :)
----------PEP: NNNTitle: Adding a file system path protocolVersion: RevisionRevisionRevisionLast-Modified: DateDateDateAuthor: Brett Cannon <brett@python.org>,Koos Zevenhoven <k7hoven@gmail.com>Status: DraftType: Standards TrackContent-Type: text/x-rstCreated: 11-May-2016Post-History: 11-May-2016,12-May-2016,13-May-2016Abstract========This PEP proposes a protocol for classes which represent a file systempath to be able to provide a \`\`str\`\` or \`\`bytes\`\` representation.Changes to Python's standard library are also proposed to utilize thisprotocol where appropriate to facilitate the use of path objects wherehistorically only \`\`str\`\` and/or \`\`bytes\`\` file system paths areaccepted. The goal is to facilitate the migration of users towardsrich path objects while providing an easy way to work with codeexpecting \`\`str\`\` or \`\`bytes\`\`.Rationale=========Historically in Python, file system paths have been represented asstrings or bytes. This choice of representation has stemmed from C'sown decision to represent file system paths as\`\`const char \*\`\` \[#libc-open\]\_. While that is a totally serviceableformat to use for file system paths, it's not necessarily optimal. Atissue is the fact that while all file system paths can be representedas strings or bytes, not all strings or bytes represent a file systempath. This can lead to issues where any e.g. string duck-types to afile system path whether it actually represents a path or not.To help elevate the representation of file system paths from theirrepresentation as strings and bytes to a richer object representation,the pathlib module \[#pathlib\]\_ was provisionally introduced inPython 3.4 through PEP 428\. While considered by some as an improvementover strings and bytes for file system paths, it has suffered from alack of adoption. Typically the key issue listed for the low adoptionrate has been the lack of support in the standard library. This lackof support required users of pathlib to manually convert path objectsto strings by calling \`\`str(path)\`\` which many found error-prone.One issue in converting path objects to strings comes fromthe fact that the only generic way to get a string representation ofthe path was to pass the object to \`\`str()\`\`. This can pose aproblem when done blindly as nearly all Python objects have somestring representation whether they are a path or not, e.g.\`\`str(None)\`\` will give a result that\`\`builtins.open()\`\` \[#builtins-open\]\_ will happily use to create a newfile.Exacerbating this whole situation is the\`\`DirEntry\`\` object \[#os-direntry\]\_. While path objects have arepresentation that can be extracted using \`\`str()\`\`, \`\`DirEntry\`\`objects expose a \`\`path\`\` attribute instead. Having no commoninterface between path objects, \`\`DirEntry\`\`, and any otherthird-party path library has become an issue. A solution that allowsany path-representing object to declare that it is a path and a wayto extract a low-level representation that all path objects couldsupport is desired.This PEP then proposes to introduce a new protocol to be followed byobjects which represent file system paths. Providing a protocol allowsfor explicit signaling of what objects represent file system paths aswell as a way to extract a lower-level representation that can be usedwith older APIs which only support strings or bytes.Discussions regarding path objects that led to this PEP can be foundin multiple threads on the python-ideas mailing list archive\[#python-ideas-archive\]\_ for the months of March and April 2016 and onthe python-dev mailing list archives \[#python-dev-archive\]\_ duringApril 2016.Proposal========This proposal is split into two parts. One part is the proposal of aprotocol for objects to declare and provide support for exposing afile system path representation. The other part deals with changes toPython's standard library to support the new protocol. These changeswill also lead to the pathlib module dropping its provisional status.Protocol--------The following abstract base class defines the protocol for an objectto be considered a path object::import abcimport typing as tclass PathLike(abc.ABC):"""Abstract base class for implementing the file system path protocol."""@abc.abstractmethoddef \_\_fspath\_\_(self) -> t.Union\[str, bytes\]:"""Return the file system path representation of the object."""raise NotImplementedErrorObjects representing file system paths will implement the\`\`\_\_fspath\_\_()\`\` method which will return the \`\`str\`\` or \`\`bytes\`\`representation of the path. The \`\`str\`\` representation is thepreferred low-level path representation as it is human-readable andwhat people historically represent paths as.Standard library changes------------------------It is expected that most APIs in Python's standard library thatcurrently accept a file system path will be updated appropriately toaccept path objects (whether that requires code or simply an updateto documentation will vary). The modules mentioned below, though,deserve specific details as they have either fundamental changes thatempower the ability to use path objects, or entail additions/removalof APIs.builtins''''''''\`\`open()\`\` \[#builtins-open\]\_ will be updated to accept path objects aswell as continue to accept \`\`str\`\` and \`\`bytes\`\`.os'''The \`\`fspath()\`\` function will be added with the following semantics::import typing as tdef fspath(path: t.Union\[PathLike, str, bytes\]) -> t.Union\[str, bytes\]:"""Return the string representation of the path.If str or bytes is passed in, it is returned unchanged."""if isinstance(path, (str, bytes)):return path# Work from the object's type to match method resolution of other magic# methods.path\_type = type(path)try:return path\_type.\_\_fspath\_\_(path)except AttributeError:if hasattr(path\_type, '\_\_fspath\_\_'):raiseraise TypeError("expected str, bytes or os.PathLike object, not "+ path\_type.\_\_name\_\_)The \`\`os.fsencode()\`\` \[#os-fsencode\]\_ and\`\`os.fsdecode()\`\` \[#os-fsdecode\]\_ functions will be updated to acceptpath objects. As both functions coerce their arguments to\`\`bytes\`\` and \`\`str\`\`, respectively, they will be updated to call\`\`\_\_fspath\_\_()\`\` if present to convert the path object to a \`\`str\`\` or\`\`bytes\`\` representation, and then perform their appropriatecoercion operations as if the return value from \`\`\_\_fspath\_\_()\`\` hadbeen the original argument to the coercion function in question.The addition of \`\`os.fspath()\`\`, the updates to\`\`os.fsencode()\`\`/\`\`os.fsdecode()\`\`, and the current semantics of\`\`pathlib.PurePath\`\` provide the semantics necessary toget the path representation one prefers. For a path object,\`\`pathlib.PurePath\`\`/\`\`Path\`\` can be used. To obtain the \`\`str\`\` or\`\`bytes\`\` representation without any coersion, then \`\`os.fspath()\`\`can be used. If a \`\`str\`\` is desired and the encoding of \`\`bytes\`\`should be assumed to be the default file system encoding, then\`\`os.fsdecode()\`\` should be used. If a \`\`bytes\`\` representation isdesired and any strings should be encoded using the default filesystem encoding, then \`\`os.fsencode()\`\` is used. This PEP recommendsusing path objects when possible and falling back to string paths asnecessary and using \`\`bytes\`\` as a last resort.Another way to view this is as a hierarchy of file system pathrepresentations (highest- to lowest-level): path → str → bytes. Thefunctions and classes under discussion can all accept objects on thesame level of the hierarchy, but they vary in whether they promote ordemote objects to another level. The \`\`pathlib.PurePath\`\` class canpromote a \`\`str\`\` to a path object. The \`\`os.fspath()\`\` function candemote a path object to a \`\`str\`\` or \`\`bytes\`\` instance, dependingon what \`\`\_\_fspath\_\_()\`\` returns.The \`\`os.fsdecode()\`\` function will demote a path object toa string or promote a \`\`bytes\`\` object to a \`\`str\`\`. The\`\`os.fsencode()\`\` function will demote a path or string object to\`\`bytes\`\`. There is no function that provides a way to demote a pathobject directly to \`\`bytes\`\` while bypassing string demotion.The \`\`DirEntry\`\` object \[#os-direntry\]\_ will gain an \`\`\_\_fspath\_\_()\`\`method. It will return the same value as currently found on the\`\`path\`\` attribute of \`\`DirEntry\`\` instances.The Protocol\_ ABC will be added to the \`\`os\`\` module under the name\`\`os.PathLike\`\`.os.path'''''''The various path-manipulation functions of \`\`os.path\`\` \[#os-path\]\_will be updated to accept path objects. For polymorphic functions thataccept both bytes and strings, they will be updated to simply use\`\`os.fspath()\`\`.During the discussions leading up to this PEP it was suggested that\`\`os.path\`\` not be updated using an "explicit is better than implicit"argument. The thinking was that since \`\`\_\_fspath\_\_()\`\` is polymorphicitself it may be better to have code working with \`\`os.path\`\` extractthe path representation from path objects explicitly. There is alsothe consideration that adding support this deep into the low-level OSAPIs will lead to code magically supporting path objects withoutrequiring any documentation updated, leading to potential complaintswhen it doesn't work, unbeknownst to the project author.But it is the view of this PEP that "practicality beats purity" inthis instance. To help facilitate the transition to supporting pathobjects, it is better to make the transition as easy as possible thanto worry about unexpected/undocumented duck typing support forpath objects by projects.There has also been the suggestion that \`\`os.path\`\` functions could beused in a tight loop and the overhead of checking or calling\`\`\_\_fspath\_\_()\`\` would be too costly. In this scenario onlypath-consuming APIs would be directly updated and path-manipulatingAPIs like the ones in \`\`os.path\`\` would go unmodified. This wouldrequire library authors to update their code to support path objectsif they performed any path manipulations, but if the library codepassed the path straight through then the library wouldn't need to beupdated. It is the view of this PEP and Guido, though, that this is anunnecessary worry and that performance will still be acceptable.pathlib'''''''The constructor for \`\`pathlib.PurePath\`\` and \`\`pathlib.Path\`\` will beupdated to accept \`\`PathLike\`\` objects. Both \`\`PurePath\`\` and \`\`Path\`\`will continue to not accept \`\`bytes\`\` path representations, and so if\`\`\_\_fspath\_\_()\`\` returns \`\`bytes\`\` it will raise an exception.The \`\`path\`\` attribute will be removed as this PEP makes itredundant (it has not been included in any released version of Pythonand so is not a backwards-compatibility concern).C API'''''The C API will gain an equivalent function to \`\`os.fspath()\`\`::/\*Return the file system path of the object.If the object is str or bytes, then allow it to pass through withan incremented refcount. If the object defines \_\_fspath\_\_(), thenreturn the result of that method. All other types raise a TypeError.\*/PyObject \*PyOS\_FSPath(PyObject \*path){if (PyUnicode\_Check(path) || PyBytes\_Check(path)) {Py\_INCREF(path);return path;}if (PyObject\_HasAttrString(path->ob\_type, "\_\_fspath\_\_")) {return PyObject\_CallMethodObjArgs(path->ob\_type, "\_\_fspath\_\_", path,NULL);}return PyErr\_Format(PyExc\_TypeError,"expected a str, bytes, or os.PathLike object, not %S",path->ob\_type);}Backwards compatibility=======================There are no explicit backwards-compatibility concerns. Unless anobject incidentally already defines a \`\`\_\_fspath\_\_()\`\` method there isno reason to expect the pre-existing code to break or expect to haveits semantics implicitly changed.Libraries wishing to support path objects and a version of Pythonprior to Python 3.6 and the existence of \`\`os.fspath()\`\` can use theidiom of\`\`path.\_\_fspath\_\_() if hasattr(path, "\_\_fspath\_\_") else path\`\`.Implementation==============This is the task list for what this PEP proposes:#. Remove the \`\`path\`\` attribute from pathlib#. Remove the provisional status of pathlib#. Add \`\`os.PathLike\`\`#. Add \`\`os.fspath()\`\`#. Add \`\`PyOS\_FSPath()\`\`#. Update \`\`os.fsencode()\`\`#. Update \`\`os.fsdecode()\`\`#. Update \`\`pathlib.PurePath\`\` and \`\`pathlib.Path\`\`#. Update \`\`builtins.open()\`\`#. Update \`\`os.DirEntry\`\`#. Update \`\`os.path\`\`#. Add a glossary entry for "path-like"Rejected Ideas==============Other names for the protocol's method-------------------------------------Various names were proposed during discussions leading to this PEP,including \`\`\_\_path\_\_\`\`, \`\`\_\_pathname\_\_\`\`, and \`\`\_\_fspathname\_\_\`\`. Inthe end people seemed to gravitate towards \`\`\_\_fspath\_\_\`\` for beingunambiguous without being unnecessarily long.Separate str/bytes methods--------------------------At one point it was suggested that \`\`\_\_fspath\_\_()\`\` only returnstrings and another method named \`\`\_\_fspathb\_\_()\`\` be introduced toreturn bytes. The thinking is that by making \`\`\_\_fspath\_\_()\`\` not bepolymorphic it could make dealing with the potential string or bytesrepresentations easier. But the general consensus was that returningbytes will more than likely be rare and that the various functions inthe os module are the better abstraction to promote over directcalls to \`\`\_\_fspath\_\_()\`\`.Providing a \`\`path\`\` attribute------------------------------To help deal with the issue of \`\`pathlib.PurePath\`\` not inheritingfrom \`\`str\`\`, originally it was proposed to introduce a \`\`path\`\`attribute to mirror what \`\`os.DirEntry\`\` provides. In the end,though, it was determined that a protocol would provide the sameresult while not directly exposing an API that most people will neverneed to interact with directly.Have \`\`\_\_fspath\_\_()\`\` only return strings------------------------------------------Much of the discussion that led to this PEP revolved around whether\`\`\_\_fspath\_\_()\`\` should be polymorphic and return \`\`bytes\`\` as well as\`\`str\`\` or only return \`\`str\`\`. The general sentiment for this viewwas that \`\`bytes\`\` are difficult to work with due to theirinherent lack of information about their encoding and PEP 383 makesit possible to represent all file system paths using \`\`str\`\` with the\`\`surrogateescape\`\` handler. Thus, it would be better to forciblypromote the use of \`\`str\`\` as the low-level path representation forhigh-level path objects.In the end, it was decided that using \`\`bytes\`\` to represent paths issimply not going to go away and thus they should be supported to somedegree. The hope is that people will gravitate towards path objectslike pathlib and that will move people away from operating directlywith \`\`bytes\`\`.A generic string encoding mechanism-----------------------------------At one point there was a discussion of developing a generic mechanismto extract a string representation of an object that had semanticmeaning (\`\`\_\_str\_\_()\`\` does not necessarily return anything ofsemantic significance beyond what may be helpful for debugging). Inthe end, it was deemed to lack a motivating need beyond the one thisPEP is trying to solve in a specific fashion.Have \_\_fspath\_\_ be an attribute-------------------------------It was briefly considered to have \`\`\_\_fspath\_\_\`\` be an attributeinstead of a method. This was rejected for two reasons. One,historically protocols have been implemented as "magic methods" andnot "magic methods and attributes". Two, there is no guarantee thatthe lower-level representation of a path object will be pre-computed,potentially misleading users that there was no expensive computationbehind the scenes in case the attribute was implemented as a property.This also indirectly ties into the idea of introducing a \`\`path\`\`attribute to accomplish the same thing. This idea has an added issue,though, of accidentally having any object with a \`\`path\`\` attributemeet the protocol's duck typing. Introducing a new magic method forthe protocol helpfully avoids any accidental opting into the protocol.Provide specific type hinting support-------------------------------------There was some consideration to provdinga generic \`\`typing.PathLike\`\`class which would allow for e.g. \`\`typing.PathLike\[str\]\`\` to specifya type hint for a path object which returned a string representation.While potentially beneficial, the usefulness was deemed too small tobother adding the type hint class.This also removed any desire to have a class in the \`\`typing\`\` modulewhich represented the union of all acceptable path-representing typesas that can be represented with\`\`typing.Union\[str, bytes, os.PathLike\]\`\` easily enough and the hopeis users will slowly gravitate to path objects only.Provide \`\`os.fspathb()\`\`------------------------It was suggested that to mirror the structure of e.g.\`\`os.getcwd()\`\`/\`\`os.getcwdb()\`\`, that \`\`os.fspath()\`\` only return\`\`str\`\` and that another function named \`\`os.fspathb()\`\` beintroduced that only returned \`\`bytes\`\`. This was rejected as thepurposes of the \`\`\*b()\`\` functions are tied to querying the filesystem where there is a need to get the raw bytes back. As this PEPdoes not work directly with data on a file system (but which \*may\*be), the view was taken this distinction is unnecessary. It's alsobelieved that the need for only bytes will not be common enough toneed to support in such a specific manner as \`\`os.fsencode()\`\` willprovide similar functionality.Call \`\`\_\_fspath\_\_()\`\` off of the instance-----------------------------------------An earlier draft of this PEP had \`\`os.fspath()\`\` calling\`\`path.\_\_fspath\_\_()\`\` instead of \`\`type(path).\_\_fspath\_\_(path)\`\`. Thechanged to be consistent with how other magic methods in Python areresolved.Acknowledgements================Thanks to everyone who participated in the various discussions relatedto this PEP that spanned both python-ideas and python-dev. Specialthanks to Stephen Turnbull for direct feedback on early drafts of thisPEP. More special thanks to Koos Zevenhoven and Ethan Furman for notonly feedback on early drafts of this PEP but also helping to drivethe overall discussion on this topic across the two mailing lists.References==========.. \[#python-ideas-archive\] The python-ideas mailing list archive.. \[#python-dev-archive\] The python-dev mailing list archive.. \[#libc-open\] \`\`open()\`\` documention for the C standard library.. \[#pathlib\] The \`\`pathlib\`\` module.. \[#builtins-open\] The \`\`builtins.open()\`\` function.. \[#os-fsencode\] The \`\`os.fsencode()\`\` function.. \[#os-fsdecode\] The \`\`os.fsdecode()\`\` function.. \[#os-direntry\] The \`\`os.DirEntry\`\` class.. \[#os-path\] The \`\`os.path\`\` moduleCopyright=========This document has been placed in the public domain...Local Variables:mode: indented-textindent-tabs-mode: nilsentence-end-double-space: tfill-column: 70coding: utf-8End: