[Python-Dev] Finding overlapping matches with re assertions: bug or feature? (original) (raw)
Tim Peters tim.peters at gmail.com
Fri Nov 15 07:48:33 CET 2013
- Previous message: [Python-Dev] "*zip-bomb" via codecs
- Next message: [Python-Dev] Finding overlapping matches with re assertions: bug or feature?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
I was surprised to find that "this works": if you want to find all overlapping matches for a regexp R, wrap it in
(?=(R))
and feed it to (say) finditer. Here's a very simple example, finding all overlapping occurrences of "xx":
pat = re.compile("(?=(xx))")
for it in pat.finditer("xxxx"):
print(it.span(1))
That displays:
(0, 2)
(1, 3)
(2, 4)
Is that a feature? Or an accident? It's very surprising to find a non-empty match inside an empty match (the outermost lookahead assertion). If it's intended behavior, it's just in time for the holiday season; e.g., to generate ASCII art for half an upside-down Christmas tree:
pat = re.compile("(?=(x+))")
for it in pat.finditer("xxxxxxxxxx"):
print(it.group(1))
- Previous message: [Python-Dev] "*zip-bomb" via codecs
- Next message: [Python-Dev] Finding overlapping matches with re assertions: bug or feature?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]