[Python-Dev] sre.split question (original) (raw)

Chris King colanderman at gmail.com
Tue Jul 20 19:13:49 CEST 2004


I'm curious as to this bit of code in pattern_split() in Modules/_sre.c:

    if (state.start == state.ptr) {
        if (last == state.end)
            break;
        /* skip one character */
        state.start = (void*) ((char*) state.ptr + state.charsize);
        continue;
    }

This precludes use of patterns that can successfully match zero-length strings (e.g. r'(?<=[A-Za-z])(?=[^A-Za-z])'. Skipping one character is of course the correct behaviour, but what purpose do the break and continue serve? The only one I can think of is to stop silly patterns like r'\s*' from returning a list of characters, but there may be other reasons I haven't thought of.

(Yes, I know I can get the effect I want by using finditer() ;))



More information about the Python-Dev mailing list