HTML Parser - beginner needs help
Fredrik Lundh
effbot at telia.com
Thu Sep 14 15:40:29 EDT 2000
"zet" wrote:
> Can somebody provide small piece of code, which returns list of img tags?
> I've trying this lines:
>
> class IMGParser(HTMLParser):
> def end_img(arg):
> return
if you're looking for tags, sgmllib is usually easier to use.
here's an example:
# extract image tags
# (based on sgmllib-example-1.py from the eff-bot guide)
import sgmllib
class ImageParser(sgmllib.SGMLParser):
def __init__(self, verbose=0):
sgmllib.SGMLParser.__init__(self, verbose)
self.images = []
def do_img(self, attrs):
for k, v in attrs:
if k == "src":
self.images.append(v)
break
def extract(file):
# get img tags from an HTML/SGML stream
p = ImageParser()
while 1:
s = file.read(1024)
if not s:
break
p.feed(s)
p.close()
return p.images
#
# try it out
import urllib
print extract(urllib.urlopen("http://www.python.org"))
## prints:
##
## ['./pics/PyBanner011.gif',
## './pics/PythonPoweredSmall.gif',
## 'pics/pythonHi.gif']
</F>
<!-- (the eff-bot guide to) the standard python library:
http://www.pythonware.com/people/fredrik/librarybook.htm
-->
More information about the Python-list
mailing list