How to extract a part of html file
Mike Meyer
mwm at mired.org
Thu Oct 20 09:47:37 EDT 2005
Ben Finney <bignose+hates-spam at benfinney.id.au> writes:
> Joe <dinamo99 at lycos.com> wrote:
>> I'm trying to extract part of html code from a tag to a tag
> For tag soup, use BeautifulSoup:
> <URL:http://www.crummy.com/software/BeautifulSoup/>
Except he's trying to extract an apparently random part of the
file. BeautifulSoup is a wonderful thing for dealing with X/HTML
documents as structured documents, which is how you want to deal with
them most of the time.
In this case, an re works nicely:
>>> import re
>>> s = '<span class="boldyellow"><B><U> and ends with TD><TD> <img src="http://whatever/some.gif"> </TD></TR></TABLE>'
>>> r = re.match('<span class="boldyellow"><B><U>(.*)TD><TD> <img src="http://whatever/some.gif"> </TD></TR></TABLE>', s)
>>> r.group(1)
' and ends with '
>>>
String.find also works really well:
>>> start = s.find('<span class="boldyellow"><B><U>') + len('<span class="boldyellow"><B><U>')
>>> stop = s.find('TD><TD> <img src="http://whatever/some.gif"> </TD></TR></TABLE>', start)
>>> s[start:stop]
' and ends with '
>>>
Not a lot to choose between them.
<mike
--
Mike Meyer <mwm at mired.org> http://www.mired.org/home/mwm/
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.
More information about the Python-list
mailing list