Extract information from HTML table

irstas at gmail.com irstas at gmail.com
Sun Apr 1 08:52:04 EDT 2007


On Apr 1, 3:13 pm, "Ulysse" <maxim... at gmail.com> wrote:
> Hello,
>
> I'm trying to extract the data from HTML table. Here is the part of
> the HTML source :
>
> ....
>
> Do you know the way to do it ?

Beautiful Soup is an easy way to parse HTML (that may be broken).
http://www.crummy.com/software/BeautifulSoup/

Here's a start of a parser for your HTML:

soup = BeautifulSoup(txt)
for tr in soup('tr'):
    dateTd, textTd = tr('td')[1:]
    print 'Date :', dateTd.contents[0].strip()
    print textTd #element still needs parsing

where txt is the string in your message.




More information about the Python-list mailing list