[Python-Dev] re.split on empty patterns

Tim Peters tim.peters at gmail.com
Sat Aug 7 19:58:16 CEST 2004


[A.M. Kuchling]
 <amk at amk.ca> wrote:
> The re.split() method ignores zero-length pattern matches.  Patch
> #988761 adds an emptyok flag to split that causes zero-length matches
> to trigger a split.

...

> IMHO this feature is clearly useful,

Yes it is!  Or, more accurately, it can be, when it's intended to
match an empty string.  It's a bit fuzzy because regexps are so
error-prone, and writing a regexp that matches an empty string by
accident is easy.

> and would be happy to commit the patch as-is.

Haven't looked at the patch, though.

> Question: do we want to make this option the new default?  Existing
> patterns that can produce zero-length matches would change their
> meanings:
>
> >>> re.split('x*', 'abxxxcdefxxx')
> ['ab', 'cdef', '']
> >>> re.split('x*', 'abxxxcdefxxx', emptyok=True)
> ['', 'a', 'b', '', 'c', 'd', 'e', 'f', '', '']
>
> (I think the result of the second match points up a bug in the patch;
> the empty strings in the middle seem wrong to me.  Assume that gets
> fixed.)

Agreed.

> Anyway, we therefore can't just make this the default in 2.4.  We
> could trigger a warning when emptyok is not supplied and a split
> pattern results in a zero-length match; users could supply
> emptyok=False to avoid the warning.  Patterns that never have a
> zero-length match would never get the warning.  2.5 could then set
> emptyok to True.
>
> Note: raising the warning might cause a serious performance hit for
> patterns that get zero-length matches a lot, which would make 2.4
> slower in certain cases.

If you don't intend to change the default, there's no problem.  I like
"no problem".  This isn't so useful so often that it can't afford to
wait for Python 3 to change.  In the meantime, "emptyok" is an odd
name since it's always "ok" to have an empty match.  "split0=True"
reads better to me, since the effect is to split on a 0-length match. 
split_on_empty_match would be too wordy.

> Thoughts?  Does this need a PEP?

It will if an argument starts now <wink>.


More information about the Python-Dev mailing list