Newbie ? -- SGML metadata extraction
Adonis
adonisv at DELETETHISTEXTearthlink.net
Mon Jan 16 19:01:44 EST 2006
ProvoWallis wrote:
<snip>
From what I gather here is a quickie, probably better solutions on the
way but this accomplishes the idea I think.
Some helpful links:
http://docs.python.org/lib/module-sgmllib.html
http://docs.python.org/lib/module-HTMLParser.html
http://docs.python.org/lib/module-htmllib.html
---
from HTMLParser import HTMLParser
data = """<main-section no="1">
<form id="graphic_1.tif">
<form id="graphic_2.tif">
<main-section no="2">
<form id="graphic_3.tif">
<main-section no="3">
<form id="graphic_4.tif">
<form id="graphic_5.tif">
<form id="graphic_6.tif">
"""
class ParseForms(HTMLParser):
def handle_starttag(self, tag, attrs):
if tag == "form":
# attrs argument is a list of tuples [(attribute, value)]
# converted it to a dictionary to access attribute easier
print "form id: %s" % dict(attrs).get('id')
if __name__ == "__main__":
parser = ParseForms()
parser.feed(data)
More information about the Python-list
mailing list