help with re.split()

Carel Fellinger cfelling at iae.nl
Tue Feb 20 18:31:24 EST 2001


Steve Mak <stevemak at softhome.net> wrote:
> Hi guys,

>     How do I use the re.split() function so it splits a line of text,
> keeping only the word. ie: it excludes any symbols, spaces, etc. I tried
> p=re.split('[. ]+', line), but some spaces are being kept.

Maybe some space are taps in disguish?
The approach you toke has it's pitfalls, like in

>>> import re
>>> re.split(r'\W+', 'Hello, how are you?')
['Hello', 'how', 'are', 'you', '']

Notice this empty string at the end of the split-list.
That's because after the matched "?" there is an empty string
(there always is:).  The same would happen with a leading match.

Maybe finding all words in that list is simpler, like (\w+ matches
all alphanumeric strings.):

>>> import re
>>> re.findall(r'\w+', ' Hello, how are you? ')
['Hello', 'how', 'are', 'you']
>>> re.findall(r'\w+', ' Hello, 2 you 2! ')
['Hello', '2', 'you', '2']
-- 
groetjes, carel



More information about the Python-list mailing list