msg25327 - (view) |
Author: Skip Montanaro (skip.montanaro) *  |
Date: 2005-05-15 21:59 |
This seems wrong to me: >>> re.match("(UNIX{})", "UNIX{}").groups() ('UNIX',) With no numbers or commas, "{}" should not be considered special in the pattern. The docs identify three numeric repetition possibilities: {m}, {m,} and {m,n}. There's no description of {} meaning anything. Either the docs should say {} implies {1,1}, {} should have no special meaning, or an exception should be raised during compilation of the regular expression. |
|
|
msg25328 - (view) |
Author: Georg Brandl (georg.brandl) *  |
Date: 2005-06-01 16:54 |
Logged In: YES user_id=1188172 It's interesting what other RE implementations do with this ambiguity: Perl treats {} as literal in REs, as Skip proposes. Ruby does, too, but issues a warning about } being unescaped. GNU (e)grep v2.5.1 allows a bare {} only if it is at the start of a RE, but matches it literally then. GNU sed v4.1.4 does never allow it. GNU awk v3.1.4 is gracious and acts like Perl. Attached is a patch that fixes this behaviour in the appearing "common sense". |
|
|
msg25329 - (view) |
Author: Raymond Hettinger (rhettinger) *  |
Date: 2005-06-01 20:25 |
Logged In: YES user_id=80475 IMO, the simplest rule is that braces always be considered special. This accommodates future extensions, simplifies the re compiler, and makes it easier to know what needs to be escaped. |
|
|
msg25330 - (view) |
Author: Georg Brandl (georg.brandl) *  |
Date: 2005-06-01 20:30 |
Logged In: YES user_id=1188172 So, should a {} raise an error, or warn like in Ruby? |
|
|
msg25331 - (view) |
Author: Raymond Hettinger (rhettinger) *  |
Date: 2005-06-01 21:07 |
Logged In: YES user_id=80475 I prefer Skip's third option, raising an exception during compilation. This is an re syntax error. Treat it the same way that we handle similar situations with regular Python: >>> a[] SyntaxError: invalid syntax |
|
|
msg25332 - (view) |
Author: Georg Brandl (georg.brandl) *  |
Date: 2005-06-01 21:32 |
Logged In: YES user_id=1188172 Okay. Attaching patch which does that. BTW, these things are currently allowed too (treated as literals): "{" "{x" "{x}" "{x,y}" "{1,x}" etc. The patch changes that, too. |
|
|
msg25333 - (view) |
Author: Skip Montanaro (skip.montanaro) *  |
Date: 2005-06-02 11:16 |
Logged In: YES user_id=44345 In the absence of strong technical reasons, I'd vote to do what Perl does. I believe the assumption all along has been that most people coming to Python who already know how to use regular expressions are Perl programmers. It wouldn't seem to make sense to throw little land mines in their paths. I realize that explicit is better than implicit, but practicality beats purity. |
|
|
msg25334 - (view) |
Author: Georg Brandl (georg.brandl) *  |
Date: 2005-06-03 08:01 |
Logged In: YES user_id=1188172 I just realized that e.g. the string module uses unescaped braces, so I think we should not become overly strict as it would break much code... Perhaps the original patch (sre-brace-diff) is better... |
|
|
msg25335 - (view) |
Author: Skip Montanaro (skip.montanaro) *  |
Date: 2005-06-03 15:13 |
Logged In: YES user_id=44345 Can you elaborate? I fail to see what the string module has to do with the re module. Can you give an example of code that would break? |
|
|
msg25336 - (view) |
Author: Georg Brandl (georg.brandl) *  |
Date: 2005-06-03 18:00 |
Logged In: YES user_id=1188172 Raymond said that braces should always be considered special. This includes constructs like "{(?P.*)}" which the string module uses, and which would be a syntax error then. |
|
|
msg25337 - (view) |
Author: Raymond Hettinger (rhettinger) *  |
Date: 2005-06-03 18:46 |
Logged In: YES user_id=80475 Hmm, it looks like they cannot be treated differently without breaking backwards compatability. |
|
|
msg25338 - (view) |
Author: Georg Brandl (georg.brandl) *  |
Date: 2005-06-03 19:10 |
Logged In: YES user_id=1188172 Then, I think, we should follow Perl's behaviour and treat "{}" as a literal, just like every other brace construct that isn't a repeat specifier. |
|
|
msg25339 - (view) |
Author: Georg Brandl (georg.brandl) *  |
Date: 2005-08-31 21:55 |
Logged In: YES user_id=1188172 Any more objections against treating "{}" as literal? The impact on existing code will be minimal, as I presume no one will write "{}" in a RE instead of "{1,1}" (well, who writes "{1,1}" anyway...). |
|
|
msg25340 - (view) |
Author: Gustavo Niemeyer (niemeyer) *  |
Date: 2005-08-31 22:11 |
Logged In: YES user_id=7887 I support Skip's opinion on following whatever perl is currently doing, if that won't lead to unexpected errors on current running code which was considered sane (expecting {} to behave like {1,1} is not sane :-). Your original patch looks under-optimal though (look at the tests around it). I'll fix it, or if you prefer to do it by yourself, I may apply the patch/review it/whatever. :-) |
|
|
msg25341 - (view) |
Author: Georg Brandl (georg.brandl) *  |
Date: 2005-08-31 22:16 |
Logged In: YES user_id=1188172 No, you're the expert, so you'll get the honor of fixing it. :P |
|
|
msg25342 - (view) |
Author: Gustavo Niemeyer (niemeyer) *  |
Date: 2005-09-14 08:58 |
Logged In: YES user_id=7887 Fixed in: Lib/sre_parse.py: 1.64 -> 1.65 Lib/test/test_re.py: 1.55 -> 1.56 Misc/NEWS: 1.1360 -> 1.1361 Notice that perl will also handle constructs like '{,2}' as literals, while Python will consider them as '{0,2}'. I think it's too late to change that one though, as this behavior may be relied upon in code out there. |
|
|
msg25343 - (view) |
Author: Georg Brandl (georg.brandl) *  |
Date: 2005-09-14 10:58 |
Logged In: YES user_id=1188172 Will you backport the fix? |
|
|
msg25344 - (view) |
Author: Josiah Carlson (josiahcarlson) *  |
Date: 2005-09-15 06:07 |
Logged In: YES user_id=341410 Was it a bug, or was it merely confusing semantics? |
|
|
msg25345 - (view) |
Author: Georg Brandl (georg.brandl) *  |
Date: 2005-09-15 06:12 |
Logged In: YES user_id=1188172 I would say bug. |
|
|