[Tutor] how to parse a multiple character words from plaintext
Kent Johnson
kent37 at tds.net
Sat Feb 23 12:43:44 CET 2008
John Gunderman wrote:
> I am looking to parse a plaintext from a document. However, I am
> confused about the actual methodology of it. This is because some of the
> words will be multiple digits or characters. However, I don't know the
> length of the words before the parse. Is there a way to somehow have
> open() grab something until it sees a /t or ' '? I was thinking I could
> have it count ahead the number of spaces till the stopping point and
> then parse till that point using read(), but that seems sort of
> inefficient. Is there a better way to pull this off? Thanks in advance.
How big is the file? Can you just read the whole document and parse the
resulting string? Or read by lines?
Depending on how complex your parsing is, you might want to use
pyparsing or one of the other Python parser libraries.
http://pyparsing.wikispaces.com/
http://nedbatchelder.com/text/python-parsers.html
Kent
More information about the Tutor
mailing list