How to extract a part of html file

Joe dinamo99 at lycos.com
Thu Oct 20 13:12:48 EDT 2005


Thanks Mike that is just what I was looking for, I have looked at
beautifulsoup but it doesn't really do what I want it to do, maybe I'm
just new to python and don't exactly know what it is doing just yet.
However string find woks. Thanks

On Thu, 20 Oct 2005 09:47:37 -0400, Mike Meyer wrote:

> Ben Finney <bignose+hates-spam at benfinney.id.au> writes:
> 
>> Joe <dinamo99 at lycos.com> wrote:
>>> I'm trying to extract part of html code from a tag to a tag
>> For tag soup, use BeautifulSoup:
>>     <URL:http://www.crummy.com/software/BeautifulSoup/>
> 
> Except he's trying to extract an apparently random part of the file.
> BeautifulSoup is a wonderful thing for dealing with X/HTML documents as
> structured documents, which is how you want to deal with them most of
> the time.
> 
> In this case, an re works nicely:
> 
>>>> import re
>>>> s = '<span class="boldyellow"><B><U>  and ends with TD><TD> <img
>>>> src="http://whatever/some.gif"> </TD></TR></TABLE>' r =
>>>> re.match('<span class="boldyellow"><B><U>(.*)TD><TD> <img
>>>> src="http://whatever/some.gif"> </TD></TR></TABLE>', s) r.group(1)
> '  and ends with '
>>>> 
>>>> 
> String.find also works really well:
> 
>>>> start = s.find('<span class="boldyellow"><B><U>') + len('<span
>>>> class="boldyellow"><B><U>') stop = s.find('TD><TD> <img
>>>> src="http://whatever/some.gif"> </TD></TR></TABLE>', start)
>>>> s[start:stop]
> '  and ends with '
>>>> 
>>>> 
> Not a lot to choose between them.
> 
>     <mike



More information about the Python-list mailing list