Strange re behavior: normal?

Mike Rovner mike at nospam.com
Thu Aug 14 18:00:52 EDT 2003


Robin Munn wrote:

>>>> re.split(r'\b', 'a b c d')
> ['a b c d']


Perl 5.8 does that:

> a word, I would have expected re.split(r'\b', 'a b c d' to produce
>     ['a', ' ', 'b', ' ', 'c', ' ', 'd']


> But I didn't expect that re.split(r'\b', 'a b c d') would yield no
> splits whatsoever. The module doc says "split(pattern, string[,
> maxsplit = 0]): split string by the occurrences of pattern".
> re.findall() seems to think that \b occurs eight times in 'a b c d':

But it says nothing about empty pattern in contrast to findall:
    "Empty matches are included in the result."

>>>> re.findall(r'\b', 'a b c d')
> ['', '', '', '', '', '', '', '']
>
> So why doesn't re.split() think so? I'm puzzled.

It treats r'\b' as empty ('') pattern. (Hint: Consider error for r'\b?')
IMHO that split behavior is a bug although technicaly it is not.
(From re manual:
"This module provides regular expression matching operations similar to
those found in Perl.")

Regards,
Mike

PS.
Perl also produces expected result for
split /\b/ 'a  b  c  d'
-> 'a', '  ', 'b', '  ', 'c', '  ', 'd'








More information about the Python-list mailing list