ATTN: Carel Fellinger

Steve Mak stevemak at softhome.net
Tue Feb 20 18:52:21 EST 2001


Thanks for your help!!!!!

Carel Fellinger wrote:

> Steve Mak <stevemak at softhome.net> wrote:
> > Hi guys,
>
> >     How do I use the re.split() function so it splits a line of text,
> > keeping only the word. ie: it excludes any symbols, spaces, etc. I tried
> > p=re.split('[. ]+', line), but some spaces are being kept.
>
> Maybe some space are taps in disguish?
> The approach you toke has it's pitfalls, like in
>
> >>> import re
> >>> re.split(r'\W+', 'Hello, how are you?')
> ['Hello', 'how', 'are', 'you', '']
>
> Notice this empty string at the end of the split-list.
> That's because after the matched "?" there is an empty string
> (there always is:).  The same would happen with a leading match.
>
> Maybe finding all words in that list is simpler, like (\w+ matches
> all alphanumeric strings.):
>
> >>> import re
> >>> re.findall(r'\w+', ' Hello, how are you? ')
> ['Hello', 'how', 'are', 'you']
> >>> re.findall(r'\w+', ' Hello, 2 you 2! ')
> ['Hello', '2', 'you', '2']
> --
> groetjes, carel




More information about the Python-list mailing list