GH-73435: Add pathlib.PurePath.full_match() by barneygale · Pull Request #114350 · python/cpython (original) (raw)

Conversation

This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters

[ Show hidden characters]({{ revealButtonHref }})

barneygale

@barneygale

In 49f90ba we added support for the recursive wildcard ** in pathlib.PurePath.match(). This should allow arbitrary prefix and suffix matching, like p.match('foo/**') or p.match('**/foo'), but there's a problem: for relative patterns only, match() implicitly inserts a ** token on the left hand side, causing all patterns to match from the right. As a result, it's impossible to match relative patterns from the left: PurePath('foo/bar').match('bar/**') is true!

This commit reverts the changes to match(), and instead adds a new globmatch() method that:

As a result, globmatch()'s pattern language exactly matches that of glob().

@barneygale

hugovk

@barneygale

@zooba

Does the problem extend beyond "** is allowed to match nothing"? Or can we change the existing algorithm to require that a trailing ** has to match at least one segment?

Presumably .match("foo\\*") has to find something after foo\, so would be a better version of "foo\\**" anyway, but I can't imagine anyone being happy that "foo\\**" matches "foo\\" - you wouldn't specify the ** if you didn't want something to be found there.

@barneygale

** matching nothing is quite normal I think - it's what glob.glob(), pathlib.Path.glob(), zsh, and IIRC bash do. If you want at least one segment after foo/, you can spell that with foo/**/*.

@barneygale

e.g. zsh man page:

A pathname component of the form '(foo/)#' matches a path consisting of zero or more directories matching the pattern foo.

As a shorthand, '**/' is equivalent to '(*/)#'; note that this therefore matches files in the current directory as well as subdirectories. Thus:

or

does a recursive directory search for files named 'bar' (potentially including the file 'bar' in the current directory).

@zooba

Okay, I definitely like having full in the name somewhere then. It's still a match, and the parallel is fullmatch is way more helpful than to glob (which would just raise questions about "how does match do its matching then?")

@barneygale

@barneygale

@barneygale barneygale changed the titleGH-73435: Add pathlib.PurePath.globmatch() GH-73435: Add pathlib.PurePath.full_match()

Jan 22, 2024

@barneygale

zooba

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@barneygale

Thanks for the reviews, both.

aisk pushed a commit to aisk/cpython that referenced this pull request

Feb 11, 2024

@barneygale @aisk

In 49f90ba we added support for the recursive wildcard ** in pathlib.PurePath.match(). This should allow arbitrary prefix and suffix matching, like p.match('foo/**') or p.match('**/foo'), but there's a problem: for relative patterns only, match() implicitly inserts a ** token on the left hand side, causing all patterns to match from the right. As a result, it's impossible to match relative patterns from the left: PurePath('foo/bar').match('bar/**') is true!

This commit reverts the changes to match(), and instead adds a new full_match() method that:

Glyphack pushed a commit to Glyphack/cpython that referenced this pull request

Sep 2, 2024

@barneygale @Glyphack

In 49f90ba we added support for the recursive wildcard ** in pathlib.PurePath.match(). This should allow arbitrary prefix and suffix matching, like p.match('foo/**') or p.match('**/foo'), but there's a problem: for relative patterns only, match() implicitly inserts a ** token on the left hand side, causing all patterns to match from the right. As a result, it's impossible to match relative patterns from the left: PurePath('foo/bar').match('bar/**') is true!

This commit reverts the changes to match(), and instead adds a new full_match() method that: