regex: im getting better

Duncan Booth duncan at NOSPAMrcp.co.uk
Thu Oct 3 04:26:20 EDT 2002


":B nerdy" <thoa0025 at mail.usyd.edu.au> wrote in
news:uDLm9.21074$kd3.60008 at news-server.bigpond.net.au: 

> $pattern = '|<input(\s+([^=>]*)="([^"]*)")*>|ism';
> 
> i'd like to match all the input tags's but also in a subexpression,
> i'd like to match each of the parameters in the format
> parameter_name="parameter_value"
> where parameter_name and parameter_value are strings
> 
> my pattern doesnt work, it only matches the last parameter, whats
> wrong with my pattern? and can someone show me how one would match my
> description above?
> 
> cheers
> 

Personally I wouldn't even consider using regular expressions for a parsing 
task like this. Try the code below instead:

import sgmllib

class MyParser(sgmllib.SGMLParser):
    def do_input(self, attributes):
        print "Input tag",attributes

if __name__=='__main__':
    data = '''
<html>
<body>
<input x="1" y="2">
<input p="q" r="s">
</body>
</html>
    '''
    parser = MyParser()
    parser.feed(data)
    parser.close()


-- 
Duncan Booth                                             duncan at rcp.co.uk
int month(char *p){return(124864/((p[0]+p[1]-p[2]&0x1f)+1)%12)["\5\x8\3"
"\6\7\xb\1\x9\xa\2\0\4"];} // Who said my code was obscure?



More information about the Python-list mailing list