Issue 10328: re.sub[n] doesn't seem to handle /Z replacements correctly in all cases (original) (raw)

Issue10328

Created on 2010-11-05 14:33 by Alexander.Schmolck, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Messages (4)
msg120499 - (view) Author: Alexander Schmolck (Alexander.Schmolck) Date: 2010-11-05 14:33
In certain cases a zero-width /Z match that should be replaced isn't. An example might help: re.compile('(?m)(?P<trailing_ws>[ \t]+\r*$)|(?P<no_final_newline>(?<=[^\n])\Z)').subn(lambda m:next('<'+k+'>' for k,v in m.groupdict().items() if v is not None), 'foobar ') this gives ('foobar<trailing_ws>', 1) I would have expected ('foobar<trailing_ws><no_final_newline>', 2) Contrast this with the following behavior: [m.span() for m in re.compile('(?P<trailing_ws>[ \t]+\r*$) (?P<no_final_newline>(?<=[^\n])\Z)', re.M).finditer('foobar ')] gives [(6, 7), (7, 7)] The matches are clearly not overlapping and the re module docs for sub say "Return the string obtained by replacing the leftmost non-overlapping occurrences of pattern in string by the replacement repl.", so I would have expected two replacements. This seems to be what perl is doing: echo -n 'foobar ' perl -pe 's/(?m)(?P<trailing_ws>[ \t]+\r*$)
msg120535 - (view) Author: Matthew Barnett (mrabarnett) * (Python triager) Date: 2010-11-05 21:09
It's a bug caused by trying to avoid getting stuck when a zero-width match is found. Basically the fix is to advance one character after a zero-width match, but that doesn't always give the correct result. There are a number of related issues like issue #1647489 ("zero-length match confuses re.finditer()").
msg228010 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2014-09-30 21:47
@Serhiy can you take a look at this as I recall you've been doing some regex work?
msg340960 - (view) Author: Ma Lin (malin) * Date: 2019-04-27 02:37
This bug was fixed in Python 3.7, see . Python 3.6.8 (tags/v3.6.8:3c6b436a57, Dec 23 2018, 23:31:17) [MSC v.1916 32 bit (Intel)] on win32 >>> re.compile('(?m)(?P<trailing_ws>[ \t]+\r*$)|(?P<no_final_newline>(?<=[^\n])\Z)').subn(lambda m:next('<'+k+'>' for k,v in m.groupdict().items() if v is not None), 'foobar ') ('foobar<trailing_ws>', 1) Python 3.7.3rc1 (tags/v3.7.3rc1:69785b2127, Mar 12 2019, 22:37:55) [MSC v.1916 64 bit (AMD64)] on win32 >>> re.compile('(?m)(?P<trailing_ws>[ \t]+\r*$) (?P<no_final_newline>(?<=[^\n])\Z)').subn(lambda m:next('<'+k+'>' for k,v in m.groupdict().items() if v is not None), 'foobar ') ('foobar<trailing_ws><no_final_newline>', 2)
History
Date User Action Args
2022-04-11 14:57:08 admin set github: 54537
2019-04-27 04:41:18 serhiy.storchaka set status: open -> closedresolution: out of datestage: resolved
2019-04-27 02:37:04 malin set nosy: + malinmessages: +
2019-04-26 20:42:37 BreamoreBoy set nosy: - BreamoreBoy
2014-09-30 21:47:20 BreamoreBoy set nosy: + BreamoreBoy, serhiy.storchakamessages: + versions: + Python 3.4, Python 3.5, - Python 3.1
2010-11-05 23:00:52 terry.reedy set versions: + Python 2.7, - Python 2.6
2010-11-05 21:09:58 mrabarnett set messages: +
2010-11-05 15:38:19 r.david.murray set nosy: + mrabarnett
2010-11-05 14:33:47 Alexander.Schmolck create