[Tutor] HTMLParser question
Daryl Gallatin
gman_95@hotmail.com
Sat, 20 Apr 2002 13:44:44 +0000
Hi, Thanks
Yes, a gui based program that renders HTML is what I am looking to create.
say, a python example of an early html browser
non-trivial application? You mean it would actually be simple to do?
The text browser example I have is from deitel and deitel's book and is
something that I want to expand on and create my own at least simple html
browser to at least see how it works.
Thanks!
>From: Kirby Urner <urnerk@qwest.net>
>To: "Daryl Gallatin" <gman_95@hotmail.com>, tutor@python.org
>Subject: Re: [Tutor] HTMLParser question
>Date: Fri, 19 Apr 2002 22:54:25 -0400
>
>On Friday 19 April 2002 08:23 pm, Daryl Gallatin wrote:
> > Hi, is there a good example somewhere on how to use the HTMLParser class
> > I would like to create a simple HTML web browser, but the simple example
> > provided in the documentation doesn't give me much to work on
> >
> > I have a text browser, but I want to be able to view at least Simple
>HTML
> > pages
>
>If you're saying you want a GUI-based rendering of HTML that
>pays attention to a reduced tag set (e.g. <p> and some formatting),
>that's a non-trivial application.
>
>As a learning exercise it might be fun, but the practical solution is
>to just use a browser you've already got.
>
>HTMLParser class is good for stripping information from a web
>page, like maybe you just want what's between <pre> </pre> tags.
>
>Below is a program you can run from the OS command line, to harvest
>polyhedron data in OFF format (a text format used in computational
>geometry) from a specific website. Parameters might be cube.html
>or icosa.html (these get appended to a base URL that's hardwired into
>the program).
>
>Following Lundh's advice, I just use the SGMLParser, and find I need
>to capture <pre> and </pre> tags by overwriting the unknown_starttag
>and unknown_endtag methods:
>
>#!/usr/bin/python
>"""
>Thanks to example in Puthon Standard Library, Lundh (O'Reilly)
>
>Excerpts data from http://www.scienceu.com/geometry/facts/solids/
>which just happens to be saved between <pre> </pre> tags, which
>are unique and/or first on each page.
>"""
>
>import urllib,sys
>import sgmllib
>
>class FoundPre(Exception):
> pass
>
>class ExtractPre(sgmllib.SGMLParser):
>
> def __init__(self,verbose=0):
> sgmllib.SGMLParser.__init__(self,verbose)
> self.pretag = self.data = None
>
> def handle_data(self,data):
> if self.data is not None: # skips adding unless <pre> found
> self.data.append(data)
>
> def unknown_starttag(self, tag, attrs):
> if tag=="pre":
> self.start_pre(attrs)
>
> def unknown_endtag(self, tag):
> if tag=="pre":
> self.end_pre(attrs)
>
> def start_pre(self,attrs):
> print "Yes!!!" # found my <pre> tag
> self.data = []
>
> def end_pre(self):
> self.pretag = self.data
> raise FoundPre # done parsing
>
>def getwebdata(wp):
> p = ExtractPre()
> n = 0
> try: # clever use of exception to terminate
> while 1:
> s = wp.read(512)
> if not s:
> break
> p.feed(s)
> p.close()
> except FoundPre:
> return p.pretag
> return None
>
>if __name__ == '__main__':
> webpage = sys.argv[1]
> baseurl = "http://www.scienceu.com/geometry/facts/solids/coords/"
> fp = urllib.urlopen(baseurl + webpage)
> output = open("data.txt","w")
> results = getwebdata(fp)
> fp.close()
>
> if results:
> for i in results:
> output.write(i)
> output.close()
>
>Example usage:
>
>[kirby@grunch bin]$ scipoly.py cube.html
>Yes!!!
>[kirby@grunch bin]$ cat data.txt
>
>
> OFF
> 8 6 0
> -0.469 0.000 -0.664
> 0.469 0.000 0.664
> -0.469 0.664 0.000
> -0.469 -0.664 0.000
> 0.469 0.664 0.000
> -0.469 0.000 0.664
> 0.469 0.000 -0.664
> 0.469 -0.664 0.000
>
> 4 3 7 1 5 153 51 204
> 4 1 7 6 4 153 51 204
> 4 4 1 5 2 153 51 204
> 4 5 2 0 3 153 51 204
> 4 6 0 3 7 153 51 204
> 4 4 6 0 2 153 51 204
>
>I have another Python script to read in the above file and convert
>it to Povray for rendering. Unfortunately, the data is to only 3
>significant figures, and this means some facets aren't coplanar
>enough for Povray's tastes (qhull likewise sometimes gets different
>facets, when parsing the same vertices -- I need a better source
>of coordinate data I guess).
>
>Kirby
_________________________________________________________________
Join the world’s largest e-mail service with MSN Hotmail.
http://www.hotmail.com