[Python-Dev] sre.split question

Chris King colanderman at gmail.com
Tue Jul 20 19:13:49 CEST 2004


I'm curious as to this bit of code in pattern_split() in Modules/_sre.c:

        if (state.start == state.ptr) {
            if (last == state.end)
                break;
            /* skip one character */
            state.start = (void*) ((char*) state.ptr + state.charsize);
            continue;
        }

This precludes use of patterns that can successfully match zero-length
strings (e.g. r'(?<=[A-Za-z])(?=[^A-Za-z])'.  Skipping one character
is of course the correct behaviour, but what purpose do the break and
continue serve?  The only one I can think of is to stop silly patterns
like r'\s*' from returning a list of characters, but there may be
other reasons I haven't thought of.

(Yes, I know I can get the effect I want by using finditer() ;))


More information about the Python-Dev mailing list