Hands-on HTML Table Parser/Matrix?

robert no-spam at no-spam-no-spam.invalid
Sun Jul 6 10:42:04 EDT 2008


Tim Cook wrote:
>  
> On Sun, 2008-07-06 at 14:40 +0200, robert wrote:
>> Often I want to extract some web table contents. Formats are 
>> mostly static, simple text & numbers in it, other tags to be 
>> stripped off. So a simple & fast approach would be ok.
>>
>> What of the different modules around is most easy to use, stable, 
>> up-to-date, iterator access or best matrix-access (without need 
>> for callback functions,classes.. for basic tasks)?
>>
 > There are couple of HTML examples using Pyparsing here:
 >
 > http://pyparsing.wikispaces.com/Examples
 >
 >

hm - nothing special with HTML tables.

Meanwhile:

I dislike "ClientTable" (file centric, too much parsing errors in 
real world).

"TableParse" works. Very simple&fast 70-liner regexp->matrix and 
strip/clean/HTML-entities conversion. Fast success hands-on. 
Doesn't separate nested tables and such complexities consciously - 
but works though for simple hands-on tasks in real world.


Robert



More information about the Python-list mailing list