a html parse problem

bruno modulix onurb at xiludom.gro
Fri May 27 10:22:51 EDT 2005


cheng wrote:
> hi,all
> 
> if the html like:
>  <meta name = "description" content = "a test page">
>  <meta name = "keywords"   content = "keyword1 keyword2">
> 
> if i use:
>     def handle_starttag(self, tag, attrs):
>         if tag == 'meta':
>            self.attr = attrs
>         self.headers += ['%s' % (self.attr)]
>         self.attr = ''



> will get the output:
> [('name', 'description'), ('content', 'a test page')]
> 
> [('name', 'keywords'), ('content', 'keyword1 keyword2')]

> is it some way that only take the content like " a test page, keyword1
> , keywork2"

And put it where ?-)

Well, it may looks like this:

  def handle_starttag(self, tag, attrs):
    if tag == 'meta':
        try:
	    self.content.append(attrs['content'])
        except KeyError:
            pass
    self.headers += ['%s' % attr]

HTH
-- 
bruno desthuilliers
python -c "print '@'.join(['.'.join([w[::-1] for w in p.split('.')]) for
p in 'onurb at xiludom.gro'.split('@')])"



More information about the Python-list mailing list