msg39186 - (view) |
Author: Greg Chapman (glchapman) |
Date: 2002-03-08 13:14 |
Bug report 470582 points out that nested groups can produces matches in sre even if the groups within which they are nested do not match: >>> m = sre.search(r"^((\d)\:)?(\d\d)\.(\d\d\d) ","34.123") >>m.groups()(None,′3′,′34′,′123′) >>m=pre.search(r"((d):)?(dd).(ddd)", "34.123") >>> m.groups() (None, '3', '34', '123') >>> m = pre.search(r"^((\d)\:)?(\d\d)\.(\d\d\d) ","34.123")>>m.groups()(None,′3′,′34′,′123′)>>m=pre.search(r"((d):)?(dd).(ddd)", "34.123") >>> m.groups() (None, None, '34', '123') I believe this is because in the handling of SRE_OP_MAX_UNTIL, state->lastmark is being reduced (after "((\d)\:)" fails) without NULLing out the now- invalid entries at the end of the state->mark array. In the other two cases where state->lastmark is reduced (specifically in SRE_OP_BRANCH and SRE_OP_REPEAT_ONE) memset is used to NULL out the entries at the end of the array. The attached patch does the same thing for the SRE_OP_MAX_UNTIL case. This fixes the above case and does not break anything in test_re.py. |
|
|
msg39187 - (view) |
Author: Greg Chapman (glchapman) |
Date: 2002-03-08 13:20 |
Logged In: YES user_id=86307 I forgot: here's a patch for re_tests.py which adds the case from the bug report as a test. |
|
|
msg39188 - (view) |
Author: Neal Norwitz (nnorwitz) *  |
Date: 2002-03-08 13:29 |
Logged In: YES user_id=33168 Confirmed that the test w/o fix fails and the test passes with the fix to _sre.c. But I'm not sure if the memset can go too far: memset(state->mark + lastmark + 1, 0, (state->lastmark - lastmark) * sizeof(void*)); I can try under purify, but that doesn't guarantee anything. |
|
|
msg39189 - (view) |
Author: Greg Chapman (glchapman) |
Date: 2002-03-08 15:23 |
Logged In: YES user_id=86307 I'm pretty sure the memset is correct; state->lastmark is the index of last mark written to (not the index of the next potential write). Also, it occurred to me that there is another related error here: >>> m = sre.search(r'^((\d)\:)?\d\d\.\d\d\d$', '34.123') >>> m.groups() (None, None) >>> m.lastindex 2 In other words, lastindex claims that group 2 was the last that matched, even though it didn't really match. Since lastindex is undocumented, this probably doesn't matter too much. Still, it probably should be reset if it is pointing to a group which gets "unmatched" when state->lastmark is reduced. Perhaps a function like the following should be added for use in the three places where state->lastmark is reset to a previous value: void lastmark_restore(SRE_STATE *state, int lastmark) { assert(lastmark >= 0); if (state->lastmark > lastmark) { int lastvalidindex = (lastmark == 0) ? -1 : (lastmark-1)/2+1; if (state->lastindex > lastvalidindex) state->lastindex = lastvalidindex; memset( state->mark + lastmark + 1, 0, (state->lastmark - lastmark) * sizeof(void*) ); } state->lastmark = lastmark; } |
|
|
msg39190 - (view) |
Author: Tim Peters (tim.peters) *  |
Date: 2002-03-08 18:28 |
Logged In: YES user_id=31435 Assigned to /F -- he's the expert here. |
|
|
msg39191 - (view) |
Author: Fredrik Lundh (effbot) *  |
Date: 2002-07-12 11:11 |
Logged In: YES user_id=38376 (bumped priority as a reminder to self) /F |
|
|
msg39192 - (view) |
Author: Greg Chapman (glchapman) |
Date: 2002-08-12 21:17 |
Logged In: YES user_id=86307 I noticed recently that the lastindex attribute of match objects is now documented, so I believe that the lastindex problem I described in my March 8 posting needs to be fixed. Simply, lastindex may claim that a group matched when in fact it didn't (because lastindex does not get updated when lastmark is reset to a lower value): >>> m = sre.match('(\d)?\d\d', '12') >>> m.groups() (None,) >>> m.lastindex 1 |
|
|
msg39193 - (view) |
Author: Greg Chapman (glchapman) |
Date: 2002-10-04 16:51 |
Logged In: YES user_id=86307 Assuming this patch is acceptable (I see it has not yet been applied to _sre.c), I wonder if it would be a good candidate for a backport to 2.2.2? (Though it still lacks a fix for the lastindex problem.) |
|
|
msg39194 - (view) |
Author: Gustavo Niemeyer (niemeyer) *  |
Date: 2002-11-06 14:24 |
Logged In: YES user_id=7887 Applied as: Lib/test/re_tests.py:1.30->1.31 Lib/test/test_sre.py:1.37->1.38 Misc/NEWS:1.511->1.512 Modules/_sre.c:2.83->2.84 Thank you very much! |
|
|