Strange re behavior: normal?
Mike Rovner
mike at nospam.com
Thu Aug 14 18:00:52 EDT 2003
Robin Munn wrote:
>>>> re.split(r'\b', 'a b c d')
> ['a b c d']
Perl 5.8 does that:
> a word, I would have expected re.split(r'\b', 'a b c d' to produce
> ['a', ' ', 'b', ' ', 'c', ' ', 'd']
> But I didn't expect that re.split(r'\b', 'a b c d') would yield no
> splits whatsoever. The module doc says "split(pattern, string[,
> maxsplit = 0]): split string by the occurrences of pattern".
> re.findall() seems to think that \b occurs eight times in 'a b c d':
But it says nothing about empty pattern in contrast to findall:
"Empty matches are included in the result."
>>>> re.findall(r'\b', 'a b c d')
> ['', '', '', '', '', '', '', '']
>
> So why doesn't re.split() think so? I'm puzzled.
It treats r'\b' as empty ('') pattern. (Hint: Consider error for r'\b?')
IMHO that split behavior is a bug although technicaly it is not.
(From re manual:
"This module provides regular expression matching operations similar to
those found in Perl.")
Regards,
Mike
PS.
Perl also produces expected result for
split /\b/ 'a b c d'
-> 'a', ' ', 'b', ' ', 'c', ' ', 'd'
More information about the Python-list
mailing list