Issue 10328: re.sub[n] doesn't seem to handle /Z replacements correctly in all cases (original) (raw)
Issue10328
Created on 2010-11-05 14:33 by Alexander.Schmolck, last changed 2022-04-11 14:57 by admin. This issue is now closed.
Messages (4) | ||
---|---|---|
msg120499 - (view) | Author: Alexander Schmolck (Alexander.Schmolck) | Date: 2010-11-05 14:33 |
In certain cases a zero-width /Z match that should be replaced isn't. An example might help: re.compile('(?m)(?P<trailing_ws>[ \t]+\r*$)|(?P<no_final_newline>(?<=[^\n])\Z)').subn(lambda m:next('<'+k+'>' for k,v in m.groupdict().items() if v is not None), 'foobar ') this gives ('foobar<trailing_ws>', 1) I would have expected ('foobar<trailing_ws><no_final_newline>', 2) Contrast this with the following behavior: [m.span() for m in re.compile('(?P<trailing_ws>[ \t]+\r*$) | (?P<no_final_newline>(?<=[^\n])\Z)', re.M).finditer('foobar ')] gives [(6, 7), (7, 7)] The matches are clearly not overlapping and the re module docs for sub say "Return the string obtained by replacing the leftmost non-overlapping occurrences of pattern in string by the replacement repl.", so I would have expected two replacements. This seems to be what perl is doing: echo -n 'foobar ' | perl -pe 's/(?m)(?P<trailing_ws>[ \t]+\r*$) |
msg120535 - (view) | Author: Matthew Barnett (mrabarnett) * ![]() |
Date: 2010-11-05 21:09 |
It's a bug caused by trying to avoid getting stuck when a zero-width match is found. Basically the fix is to advance one character after a zero-width match, but that doesn't always give the correct result. There are a number of related issues like issue #1647489 ("zero-length match confuses re.finditer()"). | ||
msg228010 - (view) | Author: Mark Lawrence (BreamoreBoy) * | Date: 2014-09-30 21:47 |
@Serhiy can you take a look at this as I recall you've been doing some regex work? | ||
msg340960 - (view) | Author: Ma Lin (malin) * | Date: 2019-04-27 02:37 |
This bug was fixed in Python 3.7, see . Python 3.6.8 (tags/v3.6.8:3c6b436a57, Dec 23 2018, 23:31:17) [MSC v.1916 32 bit (Intel)] on win32 >>> re.compile('(?m)(?P<trailing_ws>[ \t]+\r*$)|(?P<no_final_newline>(?<=[^\n])\Z)').subn(lambda m:next('<'+k+'>' for k,v in m.groupdict().items() if v is not None), 'foobar ') ('foobar<trailing_ws>', 1) Python 3.7.3rc1 (tags/v3.7.3rc1:69785b2127, Mar 12 2019, 22:37:55) [MSC v.1916 64 bit (AMD64)] on win32 >>> re.compile('(?m)(?P<trailing_ws>[ \t]+\r*$) | (?P<no_final_newline>(?<=[^\n])\Z)').subn(lambda m:next('<'+k+'>' for k,v in m.groupdict().items() if v is not None), 'foobar ') ('foobar<trailing_ws><no_final_newline>', 2) |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:57:08 | admin | set | github: 54537 |
2019-04-27 04:41:18 | serhiy.storchaka | set | status: open -> closedresolution: out of datestage: resolved |
2019-04-27 02:37:04 | malin | set | nosy: + malinmessages: + |
2019-04-26 20:42:37 | BreamoreBoy | set | nosy: - BreamoreBoy |
2014-09-30 21:47:20 | BreamoreBoy | set | nosy: + BreamoreBoy, serhiy.storchakamessages: + versions: + Python 3.4, Python 3.5, - Python 3.1 |
2010-11-05 23:00:52 | terry.reedy | set | versions: + Python 2.7, - Python 2.6 |
2010-11-05 21:09:58 | mrabarnett | set | messages: + |
2010-11-05 15:38:19 | r.david.murray | set | nosy: + mrabarnett |
2010-11-05 14:33:47 | Alexander.Schmolck | create |