Hands-on HTML Table Parser/Matrix?

Sebastian "lunar" Wiesner basti.wiesner at gmx.net
Sun Jul 6 11:34:04 EDT 2008


robert <no-spam at no-spam-no-spam.invalid>:

> Often I want to extract some web table contents. Formats are
> mostly static, simple text & numbers in it, other tags to be
> stripped off. So a simple & fast approach would be ok.
> 
> What of the different modules around is most easy to use, stable,
> up-to-date, iterator access or best matrix-access (without need
> for callback functions,classes.. for basic tasks)?

Not more than a handful of lines with lxml.html:

def htmltable2matrix(table):
    """Converts a html table to a matrix.

    :param table:  The html table element
    :type table:  An lxml element
    """
    matrix = []
    for row in table:
        matrix.append([e.text_content() for e in row])
    return matrix



-- 
Freedom is always the freedom of dissenters.
                                      (Rosa Luxemburg)



More information about the Python-list mailing list