Issue 25584: a recursive glob pattern fails to list files in the current directory (original) (raw)

Created on 2015-11-08 15:54 by xdegaye, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
rglob_zero_dirs.patch serhiy.storchaka,2015-11-09 07:14 review
rglob_zero_dirs_2.patch serhiy.storchaka,2015-11-09 08:45 review
rglob_isdir.diff xdegaye,2015-11-09 13:03
rglob_isdir_2.diff xdegaye,2015-11-09 18:02
Messages (10)
msg254344 - (view) Author: Xavier de Gaye (xdegaye) * (Python triager) Date: 2015-11-08 15:54
On archlinux during an upgrade, the package manager backups some files in /etc with a .pacnew extension. On my system there are 20 such files, 9 .pacnew files located in /etc and 11 .pacnew files in subdirectories of /etc. The following commands are run from /etc: $ shopt -s globstar $ ls **/*.pacnew | wc -w 20 $ ls *.pacnew wc -w 9 With python: $ python Python 3.6.0a0 (default:72cca30f4707, Nov 2 2015, 14:17:31) [GCC 5.2.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import glob >>> len(glob.glob('./**/*.pacnew', recursive=True)) 20 >>> len(glob.glob('*.pacnew')) 9 >>> len(glob.glob('**/*.pacnew', recursive=True)) 11 The '**/*.pacnew' pattern does not list the files in /etc, only those located in the subdirectories of /etc.
msg254352 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2015-11-08 17:46
I believe this behavior matches the documentation: "If the pattern is followed by an os.sep, only directories and subdirectories match." ('the pattern' being '**') I wonder if '***.pacnew' would work.
msg254354 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-11-08 18:30
I already don't remember if it was a deliberate design, or just implementation detail. In any case it is not documented. > I believe this behavior matches the documentation: No, it is not related. It is that './**/' will list only directories, not regular files. > I wonder if '***.pacnew' would work. No, only ** as a whole path component works.
msg254366 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2015-11-09 04:39
Ah, I see, 'pattern' there means the whole pattern. That certainly isn't clear.
msg254370 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-11-09 07:14
Likely it was implementation artifact. Current implementation is simpler butter fitted existing glob design. The problem was that '**/a' should list 'a' and 'd/a', but '**/' should list only 'd/', and not ''. Here is a patch that makes '**' to match also zero directories. Old tests were passed, new tests are added to cover this case.
msg254386 - (view) Author: Xavier de Gaye (xdegaye) * (Python triager) Date: 2015-11-09 13:03
FWIW the patch looks good to me. I find the code in glob.py difficult to read as it happily joins regular filenames together with os.path.join() or attempts to list the files contained into a regular file (sic). The attached diff makes the code more correct and easier to understand. It is meant to be applied on top of Serhiy's patch.
msg254397 - (view) Author: Xavier de Gaye (xdegaye) * (Python triager) Date: 2015-11-09 18:02
glob('invalid_dir/**', recursive=True) triggers the assert that was added by my patch in _rlistdir(). This new patch fixes this: when there is no magic character in the dirname part of a split(), and dirname is not an existing directory, then there is nothing to yield and the processing of pathname must stop (and thus in this case, no call is made to glob2() when basename is '**').
msg254412 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2015-11-09 21:19
New changeset 4532c4f37429 by Serhiy Storchaka in branch '3.5': Issue #25584: Fixed recursive glob() with patterns starting with '**'. https://hg.python.org/cpython/rev/4532c4f37429 New changeset 175cd763de57 by Serhiy Storchaka in branch 'default': Issue #25584: Fixed recursive glob() with patterns starting with '**'. https://hg.python.org/cpython/rev/175cd763de57 New changeset fefc10de2775 by Serhiy Storchaka in branch '3.5': Issue #25584: Added "escape" to the __all__ list in the glob module. https://hg.python.org/cpython/rev/fefc10de2775 New changeset 128e61cb3de2 by Serhiy Storchaka in branch 'default': Issue #25584: Added "escape" to the __all__ list in the glob module. https://hg.python.org/cpython/rev/128e61cb3de2
msg254414 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-11-09 21:58
Please open new issue for glob() optimization Xavier.
msg254441 - (view) Author: Xavier de Gaye (xdegaye) * (Python triager) Date: 2015-11-10 11:21
New issue 25596 entered: regular files handled as directories in the glob module. Thanks for fixing this Serhiy.
History
Date User Action Args
2022-04-11 14:58:23 admin set github: 69770
2015-11-10 11:21:41 xdegaye set messages: +
2015-11-09 21:58:34 serhiy.storchaka set status: open -> closedresolution: fixedmessages: + stage: patch review -> resolved
2015-11-09 21:19:30 python-dev set nosy: + python-devmessages: +
2015-11-09 18:02:23 xdegaye set files: + rglob_isdir_2.diffmessages: +
2015-11-09 13:03:51 xdegaye set files: + rglob_isdir.diffmessages: +
2015-11-09 08:45:36 serhiy.storchaka set files: + rglob_zero_dirs_2.patch
2015-11-09 07:14:45 serhiy.storchaka set files: + rglob_zero_dirs.patchversions: + Python 3.5messages: + keywords: + patchstage: patch review
2015-11-09 04:39:57 r.david.murray set messages: +
2015-11-08 18:30:59 serhiy.storchaka set assignee: serhiy.storchakamessages: +
2015-11-08 17:46:12 r.david.murray set nosy: + r.david.murray, pitroumessages: +
2015-11-08 15:54:35 xdegaye create