Extract information from HTML table

Cameron Laird claird at lairds.us
Mon Apr 2 15:28:45 EDT 2007


In article <1175503135.234560.51730 at n59g2000hsh.googlegroups.com>,
anjesh <anjeshtuladhar at gmail.com> wrote:
>On Apr 2, 12:54 am, "Dotan Cohen" <dotanco... at gmail.com> wrote:
>> On 1 Apr 2007 07:56:04 -0700, Ulysse <maxim... at gmail.com> wrote:
>>
>> > I have seen the Beautiful Soup online help and tried to apply that to
>> > my problem. But it seems to be a little bit hard. I will rather try to
>> > do this with regular expressions...
>>
>> If you think that Beautiful Soup is difficult than wait till you try
>> to do this with regexes. Granted you know the exact format of the HTML
>> you are scraping will help, if you ever need to parse HTML from an
>> unknown source than Beautiful Soup is the only way to go. Not all HTML
>> authors close their td and tr tags, and sometimes there are attributes
>> to those tags. If you plan on ever reusing the code or the format of
>> the HTML may change, then you are best off sticking with Beautiful
>> Soup.
>>
>> Dotan Cohen
>>
>> http://lyricslist.com/http://what-is-what.com/
>
>
>Have you tried HTMLParser. It can do the task you want to perform
>http://docs.python.org/lib/module-HTMLParser.html
>
>-anjesh
>

Yes, except that these last two follow-ups UNDERstate the difficulty--in
fact, the impossibility--of achieving adequate results on this problem
with regular expressions.  We'll help with the documentation for HTMLParser
and BeautifulSoup.  REs are an invitation to madness.

<URL: http://www.unixreview.com/documents/s=10121/ur0702e/ > might amuse 
those who want to think more about REs.



More information about the Python-list mailing list