Hands-on HTML Table Parser/Matrix?
Sebastian "lunar" Wiesner
basti.wiesner at gmx.net
Sun Jul 6 11:34:04 EDT 2008
robert <no-spam at no-spam-no-spam.invalid>:
> Often I want to extract some web table contents. Formats are
> mostly static, simple text & numbers in it, other tags to be
> stripped off. So a simple & fast approach would be ok.
>
> What of the different modules around is most easy to use, stable,
> up-to-date, iterator access or best matrix-access (without need
> for callback functions,classes.. for basic tasks)?
Not more than a handful of lines with lxml.html:
def htmltable2matrix(table):
"""Converts a html table to a matrix.
:param table: The html table element
:type table: An lxml element
"""
matrix = []
for row in table:
matrix.append([e.text_content() for e in row])
return matrix
--
Freedom is always the freedom of dissenters.
(Rosa Luxemburg)
More information about the Python-list
mailing list