Parsing an HTML a tag

George buffer_88 at hotmail.com
Sat Sep 24 20:16:46 EDT 2005


I'm very new to python and I have tried to read the tutorials but I am
unable to understand exactly how I must do this problem.

Specifically, the showIPnums function takes a URL as input, calls the
read_page(url) function to obtain the entire page for that URL, and
then lists, in sorted order, the IP addresses implied in the "<A
HREF=· · ·>" tags within that page.


"""
Module to print IP addresses of tags in web file containing HTML

>>> showIPnums('http://22c118.cs.uiowa.edu/uploads/easy.html')
['0.0.0.0', '128.255.44.134', '128.255.45.54']

>>> showIPnums('http://22c118.cs.uiowa.edu/uploads/pytorg.html')
['0.0.0.0', '128.255.135.49', '128.255.244.57', '128.255.30.11',
'128.255.34.132', '128.255.44.51', '128.255.45.53',
'128.255.45.54', '129.255.241.42', '64.202.167.129']

"""

def read_page(url):
 import formatter
 import htmllib
 import urllib

 htmlp = htmllib.HTMLParser(formatter.NullFormatter())
 htmlp.feed(urllib.urlopen(url).read())
 htmlp.close()

def showIPnums(URL):
 page=read_page(URL)

if __name__ == '__main__':
 import doctest, sys
 doctest.testmod(sys.modules[__name__])




More information about the Python-list mailing list