Strange re behavior: normal?
Robin Munn
rmunn at pobox.com
Thu Aug 14 16:33:05 EDT 2003
How is re.split supposed to work? This wasn't at all what I expected:
[rmunn at localhost ~]$ python
Python 2.2.2 (#1, Jan 12 2003, 12:07:20)
[GCC 3.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> re.split(r'\W+', 'a b c d')
['a', 'b', 'c', 'd']
>>> # Expected result.
...
>>> re.split(r'\b', 'a b c d')
['a b c d']
>>> # Huh?
Since \b matches the empty string, but only at the beginning and end of
a word, I would have expected re.split(r'\b', 'a b c d' to produce
either:
['', 'a', ' ', 'b', ' ', 'c', ' ', 'd', '']
or:
['a', ' ', 'b', ' ', 'c', ' ', 'd']
But I didn't expect that re.split(r'\b', 'a b c d') would yield no splits
whatsoever. The module doc says "split(pattern, string[, maxsplit = 0]):
split string by the occurrences of pattern". re.findall() seems to think
that \b occurs eight times in 'a b c d':
>>> re.findall(r'\b', 'a b c d')
['', '', '', '', '', '', '', '']
So why doesn't re.split() think so? I'm puzzled.
--
Robin Munn <rmunn at pobox.com> | http://www.rmunn.com/ | PGP key 0x6AFB6838
-----------------------------+-----------------------+----------------------
"Remember, when it comes to commercial TV, the program is not the product.
YOU are the product, and the advertiser is the customer." - Mark W. Schumann
More information about the Python-list
mailing list