Split text file into words

Heiko Wundram modelnine at ceosg.de
Tue Mar 8 08:53:25 EST 2005


On Tuesday 08 March 2005 14:43, qwweeeit wrote:
> The standard split() can use only one delimiter. To split a text file
> into words  you need multiple delimiters like blank, punctuation, math
> signs (+-*/), parenteses and so on.
>
> I didn't succeeded in using re.split()...

Then try again... ;) No, seriously, re.split() can do what you want. Just 
think about what are word delimiters.

Say, you want to split on all whitespace, and ",", ".", and "?", then you'd 
use something like:

heiko at heiko ~ $ python
Python 2.3.5 (#1, Feb 27 2005, 22:40:59)
[GCC 3.4.3 20050110 (Gentoo Linux 3.4.3.20050110, ssp-3.4.3.20050110-0, 
pie-8.7 on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> teststr = "Hello qwweeeit, how are you? I am fine, today, actually."
>>> re.split(r"[\s\.,\?]+",teststr)
['Hello', 'qwweeeit', 'how', 'are', 'you', 'I', 'am', 'fine', 'today', 
'actually', '']

Extending with other word separators shouldn't be hard... Just have a look at

http://docs.python.org/lib/re-syntax.html

HTH!

-- 
--- Heiko.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 196 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-list/attachments/20050308/19ee16e3/attachment.sig>


More information about the Python-list mailing list