Help parsing a text file

Tim Roberts timr at probo.com
Wed Aug 31 01:37:14 EDT 2011


William Gill <noreply at domain.invalid> wrote:
>
>My initial passes into Python have been very unfocused (a scatter gun of 
>too many possible directions, yielding very messy results), so I'm 
>asking for some suggestions, or algorithms (possibly even examples)that 
>may help me focus.
>
>I'm not asking anyone to write my code, just to nudge me toward a more 
>disciplined approach to a common task, and I promise to put in the 
>effort to understand the underlying fundamentals.

Python includes "sgmllib", which was designed to parse SGML-based files,
including both neat XML and slimy HTML, and "htmllib", which derives from
it.  I have used "htmllib" to parse HTML files where the tags were not
properly closed.  Perhaps you could start from "htmllib" and modify it to
handle the quirks in your particular format.
-- 
Tim Roberts, timr at probo.com
Providenza & Boekelheide, Inc.



More information about the Python-list mailing list