HTMLLib.py use

Matthew Cepl cepl at fpm.cz
Tue May 4 11:30:37 EDT 1999


In article <001501be9586$22c896c0$f29b12c2 at pythonware.com>,
  "Fredrik Lundh" <fredrik at pythonware.com> wrote:
> you forgot to pass the HTMLParser class a valid formatter
> object.

OK, not it's much better, there is no error message. But still, there is no
output from the script. I would like to get just description in metatag
DESCRIPTION of given HTML page. Is it possible to do it with htmllib (or
sgmllib to make things simpler and hopefully faster) or I have to cut things
manually by regexp? It is just training in writing simple script for learning
object oriented programming for total beginner in OOP (and all programming as
well) and trying to write Python port of ESR's SiteMap (see
http://metalab.unc.edu/pub/Linux/apps/www/indexing/sitemap-1.9.tar.gz and
http://www.tuxedo.org/~esr/sitemap.html for result). BTW, when I shall need a
content of TITLE element it should be done via start_title() or how?

Thanks

  Matthew
-----------------------------------------------------------------------------
-----------------------

from htmllib import HTMLParser
from string import lower
from htmlentitydefs import entitydefs
import sys
import formatter

class WPage(HTMLParser):

    def __init__(self, verbose=0):
        self.testdata = ""
        HTMLParser.__init__(self, formatter.NullFormatter(), verbose)

    def do_meta(self, attributes):
        data = self.testdata
        self.description = ""
        if lower(attributes[0][1])=="description":
	            self.description = str(attributes[1][1])
	print self.description

    def close(self):
        HTMLParser.close(self)

def test(args = None):

    try:
       f = open('test.htm', 'r')
    except IOError, msg:
       print file, ":", msg
       sys.exit(1)
    data = f.read()
    x = WPage()
    x.feed(data)
    print x.description
    x.close()

if __name__ == '__main__':
    test()

-----------== Posted via Deja News, The Discussion Network ==----------
http://www.dejanews.com/       Search, Read, Discuss, or Start Your Own    




More information about the Python-list mailing list