[Python-Dev] re.split on empty patterns (original) (raw)
Tim Peters tim.peters at gmail.com
Sat Aug 7 19:58:16 CEST 2004
- Previous message: [Python-Dev] re.split on empty patterns
- Next message: [Python-Dev] Re: re.split on empty patterns
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
[A.M. Kuchling] <amk at amk.ca> wrote:
The re.split() method ignores zero-length pattern matches. Patch #988761 adds an emptyok flag to split that causes zero-length matches to trigger a split.
...
IMHO this feature is clearly useful,
Yes it is! Or, more accurately, it can be, when it's intended to match an empty string. It's a bit fuzzy because regexps are so error-prone, and writing a regexp that matches an empty string by accident is easy.
and would be happy to commit the patch as-is.
Haven't looked at the patch, though.
Question: do we want to make this option the new default? Existing patterns that can produce zero-length matches would change their meanings:
>>> re.split('x*', 'abxxxcdefxxx') ['ab', 'cdef', ''] >>> re.split('x*', 'abxxxcdefxxx', emptyok=True) ['', 'a', 'b', '', 'c', 'd', 'e', 'f', '', ''] (I think the result of the second match points up a bug in the patch; the empty strings in the middle seem wrong to me. Assume that gets fixed.)
Agreed.
Anyway, we therefore can't just make this the default in 2.4. We could trigger a warning when emptyok is not supplied and a split pattern results in a zero-length match; users could supply emptyok=False to avoid the warning. Patterns that never have a zero-length match would never get the warning. 2.5 could then set emptyok to True.
Note: raising the warning might cause a serious performance hit for patterns that get zero-length matches a lot, which would make 2.4 slower in certain cases.
If you don't intend to change the default, there's no problem. I like "no problem". This isn't so useful so often that it can't afford to wait for Python 3 to change. In the meantime, "emptyok" is an odd name since it's always "ok" to have an empty match. "split0=True" reads better to me, since the effect is to split on a 0-length match. split_on_empty_match would be too wordy.
Thoughts? Does this need a PEP?
It will if an argument starts now .
- Previous message: [Python-Dev] re.split on empty patterns
- Next message: [Python-Dev] Re: re.split on empty patterns
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]