a html parse problem

cheng magicmas at spymac.com
Fri May 27 09:42:06 EDT 2005


hi,all

if the html like:
 <meta name = "description" content = "a test page">
 <meta name = "keywords"   content = "keyword1 keyword2">

if i use:
    def handle_starttag(self, tag, attrs):
        if tag == 'meta':
           self.attr = attrs
        self.headers += ['%s' % (self.attr)]
        self.attr = ''

will get the output:
[('name', 'description'), ('content', 'a test page')]

[('name', 'keywords'), ('content', 'keyword1 keyword2')]

is it some way that only take the content like " a test page, keyword1
, keywork2"




More information about the Python-list mailing list