[Tutor] how to parse a multiple character words from plaintext

Sat Feb 23 12:43:44 CET 2008

John Gunderman wrote:
> I am looking to parse a plaintext from a document. However, I am 
> confused about the actual methodology of it. This is because some of the 
> words will be multiple digits or characters. However, I don't know the 
> length of the words before the parse. Is there a way to somehow have 
> open() grab something until it sees a /t or ' '? I was thinking I could 
> have it count ahead the number of spaces till the stopping point and 
> then parse till that point using read(), but that seems sort of 
> inefficient. Is there a better way to pull this off? Thanks in advance.

How big is the file? Can you just read the whole document and parse the 
resulting string? Or read by lines?

Depending on how complex your parsing is, you might want to use 
pyparsing or one of the other Python parser libraries.
http://pyparsing.wikispaces.com/
http://nedbatchelder.com/text/python-parsers.html

Kent