Splitting on a regex w/o consuming delimiter

Tim Peters tim.one at home.com
Sun Nov 11 03:02:51 EST 2001


[Lars Kellogg-Stedman]
> I knew I should have said it the first time:  this is a contrived
> example; what if the delimiter regex was something like '\s(!|@)\s'?
> Given:
>
>   foo ! bar @ baz @ xyzzy ! mumble
>
> You'd end up with:
>
>  [ 'foo', 'bar', 'baz', 'xyzzy', 'mumble' ]
>
> With no way of recovering the delimeter.
>
> Really, I just want to be able to split on (?=pattern), or some other
> method of splitting a string without consuming the delimiter.

You cannot:  if split actually split on a zero-width match, the result would
be an infinite loop (think about it <wink>).  Besides, it's ambiguous
whether you want the separator to be attached "to the left" or "to the
right".  The best you can do is write code to force your idea of what the
right answer is.  First use a *single* capturing group around the entire
delimiter regexp, so that the delimeters are included in the result list:

p = re.compile(r'''
(            # capture the delimiter
    \s
    (?:      # do not capture anything else
        !|@
    )
    \s
)''', re.VERBOSE)

[note:  in this specific example, [!@] would be better than (?:!|@)]

Then the result is

>>> p.split('foo ! bar @ baz @ xyzzy ! mumble')
['foo', ' ! ', 'bar', ' @ ', 'baz', ' @ ', 'xyzzy', ' ! ', 'mumble']

You have to paste that together again, programming your own answers for
which delimiter goes with which non-delimiter, and what to do if a delimiter
was found at the start and/or end of the string.





More information about the Python-list mailing list