Issue 8402: Add a function to escape metacharacters in glob/fnmatch (original) (raw)

process

Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: serhiy.storchaka Nosy List: Aquinas, Tilka, a1abhishek, docs@python, eric.araujo, eric.smith, ezio.melotti, george.hu, kveretennicov, l0nwlf, martin.panter, mrabarnett, python-dev, serhiy.storchaka, terry.reedy
Priority: normal Keywords: needs review, patch

Created on 2010-04-15 00:51 by george.hu, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
issue8402.patch maker,2012-10-13 15:35
issue8402.1.patch maker,2012-10-14 21:04 review
fnmatch_escape.patch serhiy.storchaka,2012-10-15 08:42 review
fnmatch_escape_2.patch serhiy.storchaka,2012-10-15 14:50 review
fnmatch_implementation.py mrabarnett,2013-03-07 16:28
glob_escape.patch serhiy.storchaka,2013-03-11 14:39 review
glob_escape_2.patch serhiy.storchaka,2013-11-17 22:44 review
glob_escape_3.patch serhiy.storchaka,2013-11-18 10:33 review
Messages (36)
msg103160 - (view) Author: george hu (george.hu) Date: 2010-04-15 00:51
Have this problem in python 2.5.4 under windows. I'm trying to return a list of files in a directory by using glob. It keeps returning a empty list until I tested/adjusted folder name by removing "[" character from it. Not sure if this is a bug. glob.glob("c:\abc\afolderwith[test]\*") returns empty list glob.glob("c:\abc\afolderwithtest]\*") returns files
msg103163 - (view) Author: Shashwat Anand (l0nwlf) Date: 2010-04-15 01:06
When you do : glob.glob("c:\abc\afolderwith[test]\*") returns empty list It looks for all files in three directories: c:\abc\afolderwitht\* c:\abc\afolderwithe\* c:\abc\afolderwiths\* Ofcourse they do not exist so it returns empty list 06:35:05 l0nwlf-MBP:Desktop $ ls -R test 1 2 3 06:35:15 l0nwlf-MBP:Desktop $ ls -R test1 alpha beta gamma >>> glob.glob('/Users/l0nwlf/Desktop/test[123]/*') ['/Users/l0nwlf/Desktop/test1/alpha', '/Users/l0nwlf/Desktop/test1/beta', '/Users/l0nwlf/Desktop/test1/gamma'] As you can see, by giving the argument test[123] it looked for test1, test2, test3. Since test1 existed, it gave all the files present within it.
msg103164 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2010-04-15 01:09
See the explanation at http://docs.python.org/library/fnmatch.html#module-fnmatch , which uses the same rules.
msg103165 - (view) Author: george hu (george.hu) Date: 2010-04-15 01:16
Ok, what if the name of the directory contains "[]" characters? What is the escape string for that?
msg103168 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2010-04-15 01:27
The documentation for fnmatch.translate, which is what ultimately gets called, says: There is no way to quote meta-characters. Sorry. If you want to see this changed, you could open a feature request. If you have a patch, that would help! You probably want to research what the Unix shells use for escaping globs.
msg103171 - (view) Author: Shashwat Anand (l0nwlf) Date: 2010-04-15 01:34
glob module does not provide what you want. As a workaround you can try: os.listdir("c:\abc\afolderwith[test]") 07:02:52 l0nwlf-MBP:Desktop $ ls -R test\[123\]/ 1 2 3 >>> os.listdir('/Users/l0nwlf/Desktop/test[123]') ['1', '2', '3'] Changing type to 'Feature Request'
msg103173 - (view) Author: george hu (george.hu) Date: 2010-04-15 01:40
Well, the listdir doesn't support "wildcard", for example, listdir("*.app"). I know the glob is kind of unix shell style expanding, but my program is running under windows, it's my tiny script to walk through a huge directory in my NAS. And there are many directories named with "[]" and "()" characters amid. May the only way is to program a filter on the listdir. On Wed, Apr 14, 2010 at 6:34 PM, Shashwat Anand <report@bugs.python.org>wrote: > > Shashwat Anand <anand.shashwat@gmail.com> added the comment: > > glob module does not provide what you want. > As a workaround you can try: > > os.listdir("c:\abc\afolderwith[test]") > > 07:02:52 l0nwlf-MBP:Desktop $ ls -R test\[123\]/ > 1 2 3 > >>> os.listdir('/Users/l0nwlf/Desktop/test[123]') > ['1', '2', '3'] > > Changing type to 'Feature Request' > > ---------- > status: pending -> open > type: behavior -> feature request > > _______________________________________ > Python tracker <report@bugs.python.org> > <http://bugs.python.org/issue8402> > _______________________________________ >
msg103174 - (view) Author: george hu (george.hu) Date: 2010-04-15 01:43
Well, the listdir doesn't support "wildcard", for example, listdir("*.app"). I know the glob is kind of unix shell style expanding, but my program is running under windows, it's my tiny script to walk through a huge directory in my NAS. And there are many directories named with "[]" and "()" characters amid. May be the only way is to write a filter on the listdir.
msg103175 - (view) Author: Shashwat Anand (l0nwlf) Date: 2010-04-15 01:46
You repeated the same comment twice and added an 'unnamed' file. I assume you did it by mistake.
msg106545 - (view) Author: Dan Gawarecki (Aquinas) Date: 2010-05-26 17:09
Shouldn't the title be updated to indicate the fnmatch is the true source of the behavior (I'm basing this on http://docs.python.org/library/glob.html indicating the fnmatch is invoked by glob). I'm not using glob, but fnmatch in my attempt to find filenames that look like "Ajax_[version2].txt". If nothing else, it would have helped me if the documentation would state whether or not the brackets could be escaped. It doesn't appear from my tests (trying "Ajax_\[version2\].txt" and "Ajax_\\[version2\\].txt") that 'escaping' is possible, but if the filter pattern gets turned into a regular expression, I think escaping *would* be possible. Is that a reasonable assumption? I'm running 2.5.1 under Windows, and this is my first ever post to the bugs list.
msg106548 - (view) Author: Dan Gawarecki (Aquinas) Date: 2010-05-26 17:17
Following up... I saw Eric Smith's 2nd note (2010-04-15 @1:27) about fnmatch.translate documentation stating that "There is no way to quote meta-characters." When I looked at: http://docs.python.org/library/fnmatch.html#module-fnmatch did not see this statement appear anywhere. Would this absence be because someone is working on making this enhancement?
msg106550 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2010-05-26 17:32
I don't think so. That quote came from the docstring for fnmatch.translate. >>> help(fnmatch.translate) Help on function translate in module fnmatch: translate(pat) Translate a shell PATTERN to a regular expression. There is no way to quote meta-characters.
msg109682 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2010-07-09 05:06
The 3.1.2 doc for fnmatch.translate no longer says "There is no way to quote meta-characters." If that is still true (no quoting method is given that I can see), then that removal is something of a regression.
msg109743 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2010-07-09 14:15
The note about no quoting meta-chars is in the docstring for fnmatch.translate, not the documentation. I still see it in 3.1. I have a to-do item to add this to the actual documentation. I'll add an issue.
msg147434 - (view) Author: Tillmann Karras (Tilka) Date: 2011-11-11 14:33
As a workaround, it is possible to make every glob character a character set of one character (wrapping it with [] ). The gotcha here is that you can't just use multiple replaces because you would escape the escape brackets. Here is a function adapted from [1]: def escape_glob(path): transdict = { '[': '[[]', ']': '[]]', '*': '[*]', '?': '[?]', } rc = re.compile('|'.join(map(re.escape, transdict))) return rc.sub(lambda m: transdict[m.group(0)], path) [1] http://www.daniweb.com/software-development/python/code/216636
msg172635 - (view) Author: (a1abhishek) Date: 2012-10-11 12:15
i m agree with answer number 6. the resolution mentioned is quite easy and very effectve thanks http://www.packersmoversdirectory.net/
msg172810 - (view) Author: Michele Orrù (maker) * Date: 2012-10-13 15:35
The attached patch adds support for '\\' escaping to fnmatch, and consequently to glob.
msg172919 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2012-10-14 20:45
I have comments on the patch but a review link does not appear. Could you update your clone to latest default revision and regenerate the patch? Thanks.
msg172922 - (view) Author: Michele Orrù (maker) * Date: 2012-10-14 21:04
Noblesse oblige :)
msg172948 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-10-15 07:38
> The attached patch adds support for '\\' escaping to fnmatch, and consequently to glob. This is a backward incompatible change. For example glob.glob(r'C:\Program Files\*') will be broken. As flacs says a way to escape metacharacters in glob/fnmatch already exists. If someone want to match literal name "Ajax_[version2].txt" it should use pattern "Ajax_[[]version2].txt". Documentation should explicitly mentions such way. It will be good also to add new fnmatch.escape() function.
msg172951 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-10-15 08:42
Here is a patch which add fnmatch.escape() function.
msg172958 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-10-15 10:50
I am not sure if escape() should support bytes. translate() doesn't.
msg172973 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2012-10-15 14:16
I think the escaping workaround should be documented in the glob and/or fnmatch docs. This way users can simply do: import glob glob.glob("c:\abc\afolderwith[[]test]\*") rather than import glob import fnmatch glob.glob(fnmatch.escape("c:\abc\afolderwith[test]\") + "*") The function might still be useful with patterns constructed programmatically, but I'm not sure how common the problem really is.
msg172977 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-10-15 14:45
> I think the escaping workaround should be documented in the glob and/or fnmatch docs. See . This issue left for enhancement.
msg172979 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-10-15 14:50
Patch updated (thanks Ezio for review and comments).
msg175767 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2012-11-17 16:15
The workaround is now documented. I'm still not sure if this should still be added, or if it should be closed as rejected now that the workaround is documented. A third option would be adding it as a recipe in the doc, given that the whole functions boils down to a single re.sub (the user can take care of picking the bytes/str regex depending on his input).
msg177000 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-12-05 19:31
It is good, if stdlib has function for escaping any special characters, even if this function is simple. There are already escape functions for re and sgml/xml/html. Private function glob.glob1 used in Lib/msilib and Tools/msi to prevent unexpected globbing in parent directory name. ``glob.glob1(dirname, pattern)`` should be replaced by ``glob.glob(os.path.join(fnmatch.escape(dirname), pattern)`` in external code.
msg183676 - (view) Author: Matthew Barnett (mrabarnett) * (Python triager) Date: 2013-03-07 16:28
I've attached fnmatch_implementation.py, which is a simple pure-Python implementation of the fnmatch function. It's not as susceptible to catastrophic backtracking as the current re-based one. For example: fnmatch('a' * 50, '*a*' * 50) completes quickly.
msg183679 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-03-07 16:32
I think it should be a separate issue.
msg183966 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-03-11 14:39
Escaping for glob on Windows should not be such trivial. Special characters in the drive part have no special meaning and should not be escaped. I.e. ``escape('//?/c:/Quo vadis?.txt')`` should return ``'//?/c:/Quo vadis[?].txt'``. Perhaps we should move the escape function to the glob module (because it is glob's peculiarity). Here is a patch for glob.escape().
msg203179 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-11-17 14:37
Could anyone please review the patch before feature freeze?
msg203221 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-11-17 22:44
Updated patch addresses Ezio's and Eric's comments.
msg203274 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-11-18 10:33
Updated patch addresses Eric's comment.
msg203277 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2013-11-18 10:59
Looks good to me.
msg203278 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2013-11-18 11:07
New changeset 5fda36bff39d by Serhiy Storchaka in branch 'default': Issue #8402: Added the escape() function to the glob module. http://hg.python.org/cpython/rev/5fda36bff39d
msg203279 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-11-18 11:09
Thank you Ezio and Eric for your reviews.
History
Date User Action Args
2022-04-11 14:56:59 admin set github: 52649
2013-11-18 11:09:40 serhiy.storchaka set status: open -> closedmessages: + assignee: serhiy.storchakaresolution: fixedstage: patch review -> resolved
2013-11-18 11:07:37 python-dev set nosy: + python-devmessages: +
2013-11-18 10:59:31 eric.smith set messages: +
2013-11-18 10:33:55 serhiy.storchaka set files: + glob_escape_3.patchmessages: +
2013-11-17 22:44:37 serhiy.storchaka set files: + glob_escape_2.patchmessages: +
2013-11-17 15:48:06 maker set nosy: - maker
2013-11-17 14:37:43 serhiy.storchaka set messages: +
2013-06-13 10:31:51 martin.panter set nosy: + martin.panter
2013-03-11 14:39:07 serhiy.storchaka set files: + glob_escape.patchmessages: +
2013-03-07 16:32:53 serhiy.storchaka set messages: +
2013-03-07 16:28:03 mrabarnett set files: + fnmatch_implementation.pynosy: + mrabarnettmessages: +
2012-12-05 19:35:28 serhiy.storchaka link issue16620 dependencies
2012-12-05 19:31:10 serhiy.storchaka set messages: +
2012-11-17 16:15:05 ezio.melotti set messages: +
2012-11-01 20:38:08 serhiy.storchaka set keywords: + needs reviewassignee: docs@python -> (no value)resolution: not a bug -> (no value)components: - Documentationstage: patch review
2012-10-15 14:53:26 serhiy.storchaka set title: Add a way to escape metacharacters in glob/fnmatch -> Add a function to escape metacharacters in glob/fnmatch
2012-10-15 14:50:24 serhiy.storchaka set files: + fnmatch_escape_2.patchmessages: +
2012-10-15 14:45:22 serhiy.storchaka set messages: +
2012-10-15 14:16:14 ezio.melotti set messages: +
2012-10-15 12:02:42 serhiy.storchaka link issue13929 superseder
2012-10-15 10:50:35 serhiy.storchaka set messages: +
2012-10-15 08:42:17 serhiy.storchaka set files: + fnmatch_escape.patchmessages: +
2012-10-15 07:38:07 serhiy.storchaka set nosy: + serhiy.storchaka, docs@pythonmessages: + assignee: docs@pythoncomponents: + Documentation
2012-10-14 21:04:23 maker set files: + issue8402.1.patchmessages: +
2012-10-14 20:45:45 eric.araujo set nosy: + eric.araujomessages: + title: glob returns empty list with "[" character in the folder name -> Add a way to escape metacharacters in glob/fnmatch
2012-10-13 15:35:47 maker set files: + issue8402.patchversions: + Python 3.4, - Python 3.2nosy: + makermessages: + keywords: + patch
2012-10-11 12:15:02 a1abhishek set nosy: + a1abhishekmessages: +
2011-11-11 14:34:09 ezio.melotti set nosy: + ezio.melotti
2011-11-11 14:33:11 Tilka set nosy: + Tilkamessages: + components: + Library (Lib)
2010-09-27 20:41:29 kveretennicov set nosy: + kveretennicov
2010-07-09 14:15:33 eric.smith set messages: +
2010-07-09 05:06:33 terry.reedy set nosy: + terry.reedymessages: + versions: + Python 3.2, - Python 2.5
2010-05-26 17:32:59 eric.smith set messages: +
2010-05-26 17:17:10 Aquinas set messages: +
2010-05-26 17:09:09 Aquinas set nosy: + Aquinasmessages: +
2010-04-15 02:21:42 r.david.murray set files: - unnamed
2010-04-15 01:51:14 l0nwlf set title: glob returns empty list with " -> glob returns empty list with "[" character in the folder name
2010-04-15 01:46:53 l0nwlf set messages: +
2010-04-15 01:43:17 george.hu set messages: +
2010-04-15 01:40:19 george.hu set files: + unnamedmessages: + title: glob returns empty list with "[" character in the folder name -> glob returns empty list with "
2010-04-15 01:34:35 l0nwlf set status: pending -> opentype: behavior -> enhancementmessages: +
2010-04-15 01:27:56 eric.smith set status: open -> pendingmessages: +
2010-04-15 01:16:54 george.hu set status: closed -> openmessages: +
2010-04-15 01:09:30 eric.smith set status: open -> closednosy: + eric.smithmessages: + resolution: not a bug
2010-04-15 01:06:44 l0nwlf set nosy: + l0nwlfmessages: +
2010-04-15 00:51:25 george.hu create