Looking for very simple general purpose tokenizer

python at g2swaroop.net python at g2swaroop.net
Mon Jan 19 09:39:21 EST 2004


Hi,
    Maybe the Spark module can help you.
Here's a DeveloperWorks tutorial on that :

http://www-106.ibm.com/developerworks/linux/library/l-spark.html

Hope it helps,
Swaroop

---------------------------------------------------------------------------
Hi group,

I need to parse various text files in python. I was wondering if there was a
general purpose tokenizer available. I know about split(), but this
(otherwise very handy method does not allow me to specify a list of
splitting characters, only one at the time and it removes my splitting
operators (OK for spaces and \n's but not for =, / etc. Furthermore I tried
tokenize but this specifically for Python and is way too heavy for me. I am
looking for something like this:


splitchars = [' ', '\n', '=', '/', ....]
tokenlist = tokenize(rawfile, splitchars)

Is there something like this available inside Python or did anyone already
make this? Thank you in advance

Maarten
-- 
===================================================================
Maarten van Reeuwijk                        Heat and Fluid Sciences
Phd student                             dept. of Multiscale Physics
www.ws.tn.tudelft.nl                 Delft University of Technology





More information about the Python-list mailing list