Spliting a string on non alpha characters
Tim Chase
python.list at tim.thechases.com
Sat Sep 23 10:59:26 EDT 2006
> I'm relatively new to python but I already noticed that many lines of
> python code can be simplified to a oneliner by some clever coder. As
> the topics says, I'm trying to split lines like this :
>
> 'foo bar- blah/hm.lala' -> [foo, bar, blah, hm, lala]
>
> 'foo////bbbar.. xyz' -> [foo, bbbar, xyz]
>
> obviously a for loop catching just chars could do the trick, but I'm
> looking for a more elegant way. Anyone can help?
1st, I presume you mean that you want back
['foo', 'bar', 'blah', 'hm', 'lala']
instead of
[foo, bar, blah, hm, lala]
(which would presume you have variables named as such, which is
kinda funky)
That said...
Well, I'm sure there are scads of ways to do this. I know
regexps can do it fairly cleanly:
>>> import re
>>> r = re.compile(r'\w+')
>>> s = 'foo bar- blah/hm.lala'
>>> s2 = 'foo////bbbar.. xyz'
>>> r.findall(s)
['foo', 'bar', 'blah', 'hm', 'lala']
>>> r.findall(s2)
['foo', 'bbbar', 'xyz']
The regexp in question (r'\w+') translates to "one or more 'word'
character". The definition of a 'word' character depends on your
locale/encoding, but would at a minimum include your standard
alphabet, and digits.
If you're not interested in digits, and only want 26*2 letters,
you can use
>>> r = re.compile(r'[a-zA-Z]+')
instead (which would be "one or more letters in the set [a-zA-Z]").
-tkc
More information about the Python-list
mailing list