newb: BeautifulSoup

Stefan Behnel stefan.behnel-n05pAM at web.de
Fri Sep 21 01:31:35 EDT 2007


TheFlyingDutchman wrote:
> On Sep 20, 8:04 pm, crybaby <joemystery... at gmail.com> wrote:
>> I need to traverse a html page with big table that has many row and
>> columns.  For example, how to go 35th td tag and do regex to retireve
>> the content.  After that is done, you move down to 15th td tag from
>> 35th tag (35+15) and do regex to retrieve the content?
> 
> Make the file an xhtml file (valid xml) if it isn't already and then
> you can use software written to process XML files:
> 
> http://pyxml.sourceforge.net/topics/

... or just use software that can process XML and HTML the same way *and* that
supports XPath and tree iteration so that you can easily select the content
you want.

http://codespeak.net/lxml/

Stefan



More information about the Python-list mailing list