Extract information from HTML table

anjesh anjeshtuladhar at gmail.com
Mon Apr 2 04:38:55 EDT 2007


On Apr 2, 12:54 am, "Dotan Cohen" <dotanco... at gmail.com> wrote:
> On 1 Apr 2007 07:56:04 -0700, Ulysse <maxim... at gmail.com> wrote:
>
> > I have seen the Beautiful Soup online help and tried to apply that to
> > my problem. But it seems to be a little bit hard. I will rather try to
> > do this with regular expressions...
>
> If you think that Beautiful Soup is difficult than wait till you try
> to do this with regexes. Granted you know the exact format of the HTML
> you are scraping will help, if you ever need to parse HTML from an
> unknown source than Beautiful Soup is the only way to go. Not all HTML
> authors close their td and tr tags, and sometimes there are attributes
> to those tags. If you plan on ever reusing the code or the format of
> the HTML may change, then you are best off sticking with Beautiful
> Soup.
>
> Dotan Cohen
>
> http://lyricslist.com/http://what-is-what.com/


Have you tried HTMLParser. It can do the task you want to perform
http://docs.python.org/lib/module-HTMLParser.html

-anjesh




More information about the Python-list mailing list