HTML Parser - beginner needs help
Alex Martelli
aleaxit at yahoo.com
Thu Sep 14 17:51:56 EDT 2000
"zet" <zet at i.com.ua> wrote in message
news:968956212.35650 at ipt2.iptelecom.net.ua...
> Can somebody provide small piece of code, which returns list of img tags?
> I've trying this lines:
>
> class IMGParser(HTMLParser):
> def end_img(arg):
> return
>
> but it return only an anchors, how to get IMG's?
The general idea:
import sgmllib
class Imgs(sgmllib.SGMLParser):
def do_img(self, attributes):
print attributes
getim=Imgs()
getim.feed(open("c:/mydocu~1/samba98.htm").read())
getim.close()
giving output such as:
[('height', '51'), ('src', 'Samba98_files/cllogo_medium.gif'), ('width',
'220')]
[('height', '28'), ('src', 'Samba98_files/button_home.gif'), ('width',
'28')]
[('height', '28'), ('src', 'Samba98_files/button_up.gif'), ('width', '28')]
[('height', '28'), ('src', 'Samba98_files/button_home.gif'), ('width',
'28')]
[('height', '28'), ('src', 'Samba98_files/button_up.gif'), ('width', '28')]
If what you want to do is accumulate a list of the src attributes only,
for example, the class could be:
class Imgs(sgmllib.SGMLParser):
def __init__(self):
self.imgs = []
def do_img(self, attributes):
self.imgs.append(attributes[src])
and the end result would be left in the .imgs field of the object after
.close is called (of course, you could make an accessor method for
that, if you so desire).
Alex
More information about the Python-list
mailing list