[Tutor] HTMLParser question

Sat, 20 Apr 2002 13:44:44 +0000

Hi, Thanks
Yes, a gui based program that renders HTML is what I am looking to create. 
say, a python example of an early html browser

non-trivial application? You mean it would actually be simple to do?

The text browser example I have is from deitel and deitel's book and is 
something that I want to expand on and create my own at least simple html 
browser to at least see how it works.

Thanks!

>From: Kirby Urner <urnerk@qwest.net>
>To: "Daryl Gallatin" <gman_95@hotmail.com>, tutor@python.org
>Subject: Re: [Tutor] HTMLParser question
>Date: Fri, 19 Apr 2002 22:54:25 -0400
>
>On Friday 19 April 2002 08:23 pm, Daryl Gallatin wrote:
> > Hi, is there a good example somewhere on how to use the HTMLParser class
> > I would like to create a simple HTML web browser, but the simple example
> > provided in the documentation doesn't give me much to work on
> >
> > I have a text browser, but I want to be able to view at least Simple 
>HTML
> > pages
>
>If you're saying you want a GUI-based rendering of HTML that
>pays attention to a reduced tag set (e.g. <p> and some formatting),
>that's a non-trivial application.
>
>As a learning exercise it might be fun, but the practical solution is
>to just use a browser you've already got.
>
>HTMLParser class is good for stripping information from a web
>page, like maybe you just want what's between <pre> </pre> tags.
>
>Below is a program you can run from the OS command line, to harvest
>polyhedron data in OFF format (a text format used in computational
>geometry) from a specific website.  Parameters might be cube.html
>or icosa.html (these get appended to a base URL that's hardwired into
>the program).
>
>Following Lundh's advice, I just use the SGMLParser, and find I need
>to capture <pre> and </pre> tags by overwriting the unknown_starttag
>and unknown_endtag methods:
>
>#!/usr/bin/python
>"""
>Thanks to example in Puthon Standard Library, Lundh (O'Reilly)
>
>Excerpts data from http://www.scienceu.com/geometry/facts/solids/
>which just happens to be saved between <pre> </pre> tags, which
>are unique and/or first on each page.
>"""
>
>import urllib,sys
>import sgmllib
>
>class FoundPre(Exception):
>     pass
>
>class ExtractPre(sgmllib.SGMLParser):
>
>     def __init__(self,verbose=0):
>         sgmllib.SGMLParser.__init__(self,verbose)
>         self.pretag = self.data = None
>
>     def handle_data(self,data):
>         if self.data is not None: # skips adding unless <pre> found
>             self.data.append(data)
>
>     def unknown_starttag(self, tag, attrs):
>         if tag=="pre":
>            self.start_pre(attrs)
>
>     def unknown_endtag(self, tag):
>         if tag=="pre":
>            self.end_pre(attrs)
>
>     def start_pre(self,attrs):
>         print "Yes!!!"  # found my <pre> tag
>         self.data = []
>
>     def end_pre(self):
>         self.pretag = self.data
>         raise FoundPre # done parsing
>
>def getwebdata(wp):
>     p  = ExtractPre()
>     n = 0
>     try:  # clever use of exception to terminate
>         while 1:
>             s = wp.read(512)
>             if not s:
>                 break
>             p.feed(s)
>         p.close()
>     except FoundPre:
>         return p.pretag
>     return None
>
>if __name__ == '__main__':
>     webpage = sys.argv[1]
>     baseurl = "http://www.scienceu.com/geometry/facts/solids/coords/"
>     fp = urllib.urlopen(baseurl + webpage)
>     output = open("data.txt","w")
>     results = getwebdata(fp)
>     fp.close()
>
>     if results:
>         for i in results:
>             output.write(i)
>     output.close()
>
>Example usage:
>
>[kirby@grunch bin]$ scipoly.py cube.html
>Yes!!!
>[kirby@grunch bin]$ cat data.txt
>
>
>   OFF
>      8    6    0
>     -0.469     0.000    -0.664
>      0.469     0.000     0.664
>     -0.469     0.664     0.000
>     -0.469    -0.664     0.000
>      0.469     0.664     0.000
>     -0.469     0.000     0.664
>      0.469     0.000    -0.664
>      0.469    -0.664     0.000
>
>   4   3 7 1 5     153  51 204
>   4   1 7 6 4     153  51 204
>   4   4 1 5 2     153  51 204
>   4   5 2 0 3     153  51 204
>   4   6 0 3 7     153  51 204
>   4   4 6 0 2     153  51 204
>
>I have another Python script to read in the above file and convert
>it to Povray for rendering.  Unfortunately, the data is to only 3
>significant figures, and this means some facets aren't coplanar
>enough for Povray's tastes (qhull likewise sometimes gets different
>facets, when parsing the same vertices -- I need a better source
>of coordinate data I guess).
>
>Kirby

_________________________________________________________________
Join the world’s largest e-mail service with MSN Hotmail. 
http://www.hotmail.com