ANNOUNCE: ClientTable: HTML table parsing

jjl@pobox.com jjl@pobox.com
Thu, 6 Feb 2003 15:28:28 +0000


http://wwwsearch.sourceforge.net/ClientTable/

WARNING: This is an alpha release: interfaces will change, and don't
expect everything to work!  I'm looking for feedback on the API ATM, so
comments are particularly welcome.

ClientTable is a Python module for generic HTML table parsing.  It is
most useful when used in conjunction with other parsers (htmllib or
HTMLParser, regular expressions, etc.), to divide up the parsing work
between your own code and ClientTable.

 import ClientTable
 import urllib2
 response = urllib2.urlopen("http://www.acme.com/tables.html")
 tables = ClientTable.ParseFile(response, collapse_whitespace=1)
 table = tables[0]
 # Indexing a table with a string-like object gets the column under that
 # header.  ClientTable uses the first row of headers in the table by
 # default.
 assert str(table.headers_row[0]) == "Widget production"
 row = table[1]
 col = table["Widget production"]
 cell = col[1]
 cell2 = row["Widget production"]
 cell3 = row.get_cell_by_nr(0)
 assert cell is cell2 is cell3


Python 2.2 or above is required.  I will probably backport it to at
least Python 2.0 later.

For full documentation, see the docstrings in ClientTable.py.


John