GH-73435: Add pathlib.PurePath.full_match()
by barneygale · Pull Request #114350 · python/cpython (original) (raw)
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
[ Show hidden characters]({{ revealButtonHref }})
In 49f90ba we added support for the recursive wildcard **
in
pathlib.PurePath.match()
. This should allow arbitrary prefix and suffix
matching, like p.match('foo/**')
or p.match('**/foo')
, but there's a
problem: for relative patterns only, match()
implicitly inserts a **
token on the left hand side, causing all patterns to match from the right.
As a result, it's impossible to match relative patterns from the left:
PurePath('foo/bar').match('bar/**')
is true!
This commit reverts the changes to match()
, and instead adds a new
globmatch()
method that:
- Supports the recursive wildcard
**
- Matches the entire path when given a relative pattern
As a result, globmatch()
's pattern language exactly matches that of
glob()
.
Does the problem extend beyond "**
is allowed to match nothing"? Or can we change the existing algorithm to require that a trailing **
has to match at least one segment?
Presumably .match("foo\\*")
has to find something after foo\
, so would be a better version of "foo\\**"
anyway, but I can't imagine anyone being happy that "foo\\**"
matches "foo\\"
- you wouldn't specify the **
if you didn't want something to be found there.
**
matching nothing is quite normal I think - it's what glob.glob()
, pathlib.Path.glob()
, zsh, and IIRC bash do. If you want at least one segment after foo/
, you can spell that with foo/**/*
.
e.g. zsh man page:
A pathname component of the form '(foo/)#' matches a path consisting of zero or more directories matching the pattern foo.
As a shorthand, '**/' is equivalent to '(*/)#'; note that this therefore matches files in the current directory as well as subdirectories. Thus:
or
does a recursive directory search for files named 'bar' (potentially including the file 'bar' in the current directory).
Okay, I definitely like having full
in the name somewhere then. It's still a match, and the parallel is fullmatch is way more helpful than to glob (which would just raise questions about "how does match
do its matching then?")
barneygale changed the title
GH-73435: Add GH-73435: Add pathlib.PurePath.globmatch()
pathlib.PurePath.full_match()
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks for the reviews, both.
aisk pushed a commit to aisk/cpython that referenced this pull request
In 49f90ba we added support for the recursive wildcard **
in
pathlib.PurePath.match()
. This should allow arbitrary prefix and suffix
matching, like p.match('foo/**')
or p.match('**/foo')
, but there's a
problem: for relative patterns only, match()
implicitly inserts a **
token on the left hand side, causing all patterns to match from the right.
As a result, it's impossible to match relative patterns from the left:
PurePath('foo/bar').match('bar/**')
is true!
This commit reverts the changes to match()
, and instead adds a new
full_match()
method that:
- Allows empty patterns
- Supports the recursive wildcard
**
- Matches the entire path when given a relative pattern
Glyphack pushed a commit to Glyphack/cpython that referenced this pull request
In 49f90ba we added support for the recursive wildcard **
in
pathlib.PurePath.match()
. This should allow arbitrary prefix and suffix
matching, like p.match('foo/**')
or p.match('**/foo')
, but there's a
problem: for relative patterns only, match()
implicitly inserts a **
token on the left hand side, causing all patterns to match from the right.
As a result, it's impossible to match relative patterns from the left:
PurePath('foo/bar').match('bar/**')
is true!
This commit reverts the changes to match()
, and instead adds a new
full_match()
method that:
- Allows empty patterns
- Supports the recursive wildcard
**
- Matches the entire path when given a relative pattern