re.split() problem

Thu Sep 22 10:42:24 EDT 2005

Masayuki Takemura wrot:

> re.split() doesn't work as I intend.

it works as it's supposed to work.

empty matches are not considered to be valid split points, partially
because it doesn't really make sense to split on nothing in most cases,
but mostly because doing so will, most likely, result in a lot more "split
doesn't do what I want" reports than the current design...

> For example,
>
>>>> r = re.compile('^$', re.MULTILINE)
>>>> r.split('foo\nbar\n\nbaz')
> ['foo\nbar\n\nbaz']
>
> but I expected ['foo\nbar\n', 'baz'].

so use an ordinary string split, or rephrase your RE.  if not else, you
can always invert your problem:

    >>> s = "foo\nbar\n\nbaz"
    >>> re.findall("(?s).*\n\n|.+$", s)
    ['foo\nbar\n\n', 'baz']

(this also lets you use finditer so you can process huge texts without
having to hold everything in memory)

> Will it be fixed in Python 2.4.2?

2.4.2 is a bug fix release.  "doesn't work as you intend" doesn't really
count as a bug (unless you wrote the specification, of course, which I
don't think you did...)

</F>