msg55094 - (view) |
Author: Tom Lynn (tlynn) |
Date: 2007-04-27 11:35 |
I'd like to see a regexp.exact() method on regexp objects, equivalent to regexp.search(r'\A%s\Z' % pattern, ...), for parsing binary formats. It's probably not worth disturbing the current library interface for, but maybe in Py3k? |
|
|
msg55095 - (view) |
Author: Martin v. Löwis (loewis) *  |
Date: 2007-05-01 15:42 |
Moving to the feature requests tracker. Notice that in Py3k, the string type will be a Unicode type, so it's not clear to me that regular expressions on binary data will still be supported. |
|
|
msg74685 - (view) |
Author: Jeffrey C. Jacobs (timehorse) |
Date: 2008-10-13 13:31 |
Binary format searches should be supported once issue 1282 is implemented, likely as part of issue 2636 Item 32. That said, I'm not clear what you mean by exact search; wouldn't you want match instead? If your main issue is you want something that automatically binds to the beginning and ending of input, then I suppose we could add an 'exact' method where 'search' searches anywhere, 'match' matches from the start of input and 'exact' matches from beginning to ending. I'd call that a separate issue, though. In other words: byte-oriented matches is covered by 1282 and adding an 'exact' method is the only new issue here. Does that sound right? |
|
|
msg74688 - (view) |
Author: Tom Lynn (tlynn) |
Date: 2008-10-13 14:46 |
Yes, that's right. The binary aspect of it was something of a red herring, I'm afraid, although I still think that (or parsing in general) is an important use case. The prime motivation it that it's easy to either forget the '\Z' or to use '$' instead, which both cause subtle bugs. An exact() method might help to avoid that. |
|
|
msg116676 - (view) |
Author: Matthew Barnett (mrabarnett) *  |
Date: 2010-09-17 16:08 |
Does this request still stand? If so then I'll add it to the new regex module. |
|
|
msg116688 - (view) |
Author: R. David Murray (r.david.murray) *  |
Date: 2010-09-17 17:27 |
I would say you should make the call on whether or not it is worth adding. IIUC it would mean there was more than one way to do something (\Z vs 'exact'), so I personally am -0 on the feature request. But I'm not a frequent regex user, so I don't think my opinion should count for much. |
|
|
msg116724 - (view) |
Author: Tom Lynn (tlynn) |
Date: 2010-09-17 21:48 |
I don't know whether it should stand, I'm somewhere around 0 on it myself. So I guess that means it shouldn't, since it's easier to add features than remove them. The problem is that once you're aware of the need for it you need it less. In case other people are +1, I'll note that "exact" isn't a very nice name either, not being a verb. "exact_match" is a bit long but probably better (and better than "match_exact"). |
|
|
msg116755 - (view) |
Author: Georg Brandl (georg.brandl) *  |
Date: 2010-09-18 06:34 |
I'm not sure it really is so useful that it warrants a new regex method. |
|
|
msg116764 - (view) |
Author: Tom Lynn (tlynn) |
Date: 2010-09-18 11:42 |
I'm still unsure. I think this confusion does cause bugs in real-world code. Perhaps more prominence for \A and \Z in the docs? There's already a section comparing regexps starting '^' with match under "Matching vs Searching". The problem is basically that ^ and $ have weird semantics but are better recognised than \A and \Z. Looking over the docs again I see that the docs for $ are still misleading, in a way that's related to this issue: foo matches both 'foo' and 'foobar', while the regular expression foo$ matches only 'foo'. "foo$ matches only 'foo' (out of 'foo' and 'foobar')" is the correct interpretation of that, but it's easy to read it as "foo$ means exact_match('foo')", which is the misconception I was hoping to put to rest with this (foo$ also matches the 'foo' part of 'foo\nbar', even with flags=0). |
|
|
msg116765 - (view) |
Author: Tom Lynn (tlynn) |
Date: 2010-09-18 11:57 |
Actually, looking at the second part of the docs for $ (on "foo.$") makes me think the main motivating case here may be a bug in re.match:: >>> re.match('foo$', 'foo\n\n') >>> re.match('foo$', 'foo\n') <_sre.SRE_Match object at 0x00A98678> Shortening an input string shouldn't ever cause it to match, should it? |
|
|
msg116771 - (view) |
Author: Tom Lynn (tlynn) |
Date: 2010-09-18 12:51 |
Oh dear, I'm wrong on two fronts (I wish Roundup had post editing). a) foo$ doesn't match the 'foo' part of 'foo\nbar' as I stated above, but does match the 'foo' part of 'foo\n'. b) Obviously shortening an input string can cause it to match. It's still weird though. |
|
|
msg116833 - (view) |
Author: Raymond Hettinger (rhettinger) *  |
Date: 2010-09-18 22:45 |
Can we close this one? |
|
|
msg116837 - (view) |
Author: Matthew Barnett (mrabarnett) *  |
Date: 2010-09-18 23:03 |
'$' matches at the end of the string or at a newline at the end of a string (if multiline mode isn't turned on). '\Z' matches only at the end of the string. If not even the OP is convinced of the need, then I have no objection to closing. |
|
|
msg116843 - (view) |
Author: Tom Lynn (tlynn) |
Date: 2010-09-19 00:12 |
(Sorry to comment on a closed issue, it was closed as I was writing this.) It's not that I'm not convinced of the need, just not of the solution. I still think that there are problems here: a) forgetting any \Z or $ terminator to .match() is easy, b) $ is easily misunderstood (and not just by me) and I suspect commonly dangerously misused in validation routines as a result, c) '(?:%s)\Z' % regexp is noisy, combines two less-understood features, and makes simple regexps hard to read, d) '(?:%s)\Z' % regexp.pattern requires recompilation of the regexp. I think another method is probably the best solution to these, but it may have too much cost (though I'm not sure what that cost would be). Largely orthogonally, I'd like to see \Z encouraged over $ in the docs, and preferably a version of this table (probably under Matching vs Searching), corrected if I'm wrong of course: NON-MULTILINE: '^' is equivalent to '\A' '$' is equivalent to '(?:\Z|(?=\n\Z))' MULTILINE: '^' is equivalent to '(?:\A |
(?<=\n))' '$' is equivalent to '(?:\Z |
(?=\n))' But the docs already try to express the above table (or its correction) in English, so you may feel it wouldn't add anything, in which case I'd still like to see any corrections for my own edification if possible. |
msg230856 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2014-11-08 13:26 |
Was implemented as fullmatch() in . |
|
|