converting an html table to a tree

Sami Hangaslammi shang.spam.block at st.jyu.fi
Fri Aug 25 11:52:56 EDT 2000


Alex Martelli <alex at magenta.com> wrote in message
news:8o5pfs0ujc at news2.newsguy.com...

<snip>

> There's no end to the amount of such trouble you can get into,
> trying to parse HTML (or XML) documents by regular expressions.
> I _strongly_ urge anybody having to parse HTML (or XML) to
> rely on suitable parsers rather than trying to roll their own.
>
> Python's htmllib and sgmllib may not be perfect, but they're
> much better than nothing, and I'm positive they will reduce
> your stress-level (and number of obscure never-tested bugs
> waiting to happen) compared with the roll-your-own approach.

Yes, I fully agree with you. The ready-made libraries are definately
the way to go if you are parsing arbitrary HTML pages. Regexps IMHO
work as a quick solution if the form of the input is well-known
beforehand.

-- Sami Hangaslammi --





More information about the Python-list mailing list