Remove spaces and line wraps from html?

Paramjit Oberoi p_s_oberoi at hotmail.com
Fri Jun 18 16:28:08 EDT 2004


>> http://groups.google.com/groups?q=HTMLPrinter&hl=en&lr=&ie=UTF-8&c2coff=1&selm=pan.2004.03.27.22.05.55.38448240hotmail.com&rnum=1
>> 
>> (or search c.l.p for "HTMLPrinter")
>
> Thanks, I forgot to mention I am new to Python so I dont yet know how to
> use that example :(

Python has a HTMLParser module in the standard library:

http://www.python.org/doc/lib/module-HTMLParser.html
http://www.python.org/doc/lib/htmlparser-example.html

It looks complicated if you are new to all this, but it's fairly simple
really.  Using it is much better than dealing with HTML syntax yourself.

A small example:

--------------------------------------------------
from HTMLParser import HTMLParser

class MyHTMLParser(HTMLParser):
    def handle_starttag(self, tag, attrs):
        print "Encountered the beginning of a %s tag" % tag
    def handle_endtag(self, tag):
        print "Encountered the end of a %s tag" % tag

my_parser=MyHTMLParser()

html_data = """
<html>
  <head>
    <title>hi</title>
  </head>
  <body> hi </body>
</html>
"""

my_parser.feed(html_data)
--------------------------------------------------

will produce the result:
Encountered the beginning of a html tag
Encountered the beginning of a head tag
Encountered the beginning of a title tag
Encountered the end of a title tag
Encountered the end of a head tag
Encountered the beginning of a body tag
Encountered the end of a body tag
Encountered the end of a html tag

You'll be able to figure out the rest using the
documentation and some experimentation.

HTH,
-param



More information about the Python-list mailing list