Regular Expression question
Neil Cerutti
horpner at yahoo.com
Mon Aug 21 09:46:44 EDT 2006
On 2006-08-21, stevebread at yahoo.com <stevebread at yahoo.com> wrote:
> Hi, I am having some difficulty trying to create a regular expression.
>
> Consider:
>
><tag1 name="john"/> <br/> <tag2 value="adj__tall__"/>
><tag1 name="joe"/>
><tag1 name="jack"/>
><tag2 value="adj__short__"/>
>
> Whenever a tag1 is followed by a tag 2, I want to retrieve the
> values of the tag1:name and tag2:value attributes. So my end
> result here should be
>
> john, tall
> jack, short
>
> Ideas?
It seems to me that an html parser might be a better solution.
Here's a slapped-together example. It uses a simple state
machine.
from HTMLParser import HTMLParser
class MyHTMLParser(HTMLParser):
def __init__(self):
HTMLParser.__init__(self)
self.state = "get name"
self.name_attrs = None
self.result = {}
def handle_starttag(self, tag, attrs):
if self.state == "get name":
if tag == "tag1":
self.name_attrs = attrs
self.state = "found name"
elif self.state == "found name":
if tag == "tag2":
name = None
for attr in self.name_attrs:
if attr[0] == "name":
name = attr[1]
adj = None
for attr in attrs:
if attr[0] == "value" and attr[1][:3] == "adj":
adj = attr[1][5:-2]
if name == None or adj == None:
print "Markup error: expected attributes missing."
else:
self.result[name] = adj
self.state = "get name"
elif tag == "tag1":
# A new tag1 overrides the old one
self.name_attrs = attrs
p = MyHTMLParser()
p.feed("""
<tag1 name="john"/> <br/> <tag2 value="adj__tall__"/>
<tag1 name="joe"/>
<tag1 name="jack"/>
<tag2 value="adj__short__"/>
""")
print repr(p.result)
p.close()
There's probably a better way to search for attributes in attr
than "for attr in attrs", but I didn't think of it, and the
example I found on the net used the same idiom. The format of
attrs seems strange. Why isn't it a dictionary?
--
Neil Cerutti
Sermon Outline: I. Delineate your fear II. Disown your fear III.
Displace your rear --Church Bulletin Blooper
More information about the Python-list
mailing list