How to find <tag> to </tag> HTML strings and 'save' them?

Jorge Godoy jgodoy at gmail.com
Sun Mar 25 13:51:58 EDT 2007


mark at agtechnical.co.uk writes:

> Hi All,
>
> Apologies for the newbie question but I've searched and tried all
> sorts for a few days and I'm pulling my hair out ;[
>
> I have a 'reference' HTML file and a 'test' HTML file from which I
> need to pull 10 strings, all of which are contained within <h2> tags,
> e.g.:
> <h2 class=r><a href="http://www.someplace.com/">Go Someplace</a></h2>
>
> Once I've found the 10 I'd like to write them to another 'results'
> html file. Perhaps a 'reference results' and a 'test results' file.
>>From where I would then like to 'diff' the results to see if they
> match.
>
> Here's the rub: I cannot find a way to pull those 10 strings so I can
> save them to the results pages.
> Can anyone please suggest how this can be done?
>
> I've tried allsorts but I've been learning Python for 1 week and just
> don't know enough to mod example scripts it seems. don't even get me
> started on python docs.. ayaa ;] Please feel free to teach me to suck
> eggs because it's all new to me :)
>
> Thanks in advance,
>
> Mark.


Take a look at BeautifulSoup.  It is easy to use and works well with some
malformed HTML that you might find ahead.

-- 
Jorge Godoy      <jgodoy at gmail.com>



More information about the Python-list mailing list