Split text file into words
Heiko Wundram
modelnine at ceosg.de
Tue Mar 8 08:53:25 EST 2005
On Tuesday 08 March 2005 14:43, qwweeeit wrote:
> The standard split() can use only one delimiter. To split a text file
> into words you need multiple delimiters like blank, punctuation, math
> signs (+-*/), parenteses and so on.
>
> I didn't succeeded in using re.split()...
Then try again... ;) No, seriously, re.split() can do what you want. Just
think about what are word delimiters.
Say, you want to split on all whitespace, and ",", ".", and "?", then you'd
use something like:
heiko at heiko ~ $ python
Python 2.3.5 (#1, Feb 27 2005, 22:40:59)
[GCC 3.4.3 20050110 (Gentoo Linux 3.4.3.20050110, ssp-3.4.3.20050110-0,
pie-8.7 on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> teststr = "Hello qwweeeit, how are you? I am fine, today, actually."
>>> re.split(r"[\s\.,\?]+",teststr)
['Hello', 'qwweeeit', 'how', 'are', 'you', 'I', 'am', 'fine', 'today',
'actually', '']
Extending with other word separators shouldn't be hard... Just have a look at
http://docs.python.org/lib/re-syntax.html
HTH!
--
--- Heiko.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 196 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-list/attachments/20050308/19ee16e3/attachment.sig>
More information about the Python-list
mailing list