Good String Tokenizer

James Stroud jstroud at mbi.ucla.edu
Tue Jul 24 15:27:18 EDT 2007


JamesHoward wrote:
> I have searched the board and noticed that there isn't really any sort
> of good implementation of a string tokenizer that will tokenize based
> on a custom set of tokens and return both the tokens and the parts
> between the tokens.
> 
> For example, if I have the string:
> 
> "Hello, World!  How are you?"
> 
> And my splitting points are comma, and exclamation point then I would
> expect to get back.
> 
> ["Hello", ",", " World", "!", "  How are you?"]
> 
> Does anyone know of a tokenizer that will allow for this sort of use?
> 
> Thanks in advance,
> Jim Howard
> 

Pyparsing: http://pyparsing.wikispaces.com/

James

-- 
James Stroud
UCLA-DOE Institute for Genomics and Proteomics
Box 951570
Los Angeles, CA 90095

http://www.jamesstroud.com/



More information about the Python-list mailing list