Remove spaces and line wraps from html?

RiGGa rigga at hasnomail.com
Sat Jun 19 02:40:33 EDT 2004


RiGGa wrote:

> Paramjit Oberoi wrote:
> 
>>>>
>
http://groups.google.com/groups?q=HTMLPrinter&hl=en&lr=&ie=UTF-8&c2coff=1&selm=pan.2004.03.27.22.05.55.38448240hotmail.com&rnum=1
>>>> 
>>>> (or search c.l.p for "HTMLPrinter")
>>>
>>> Thanks, I forgot to mention I am new to Python so I dont yet know how to
>>> use that example :(
>> 
>> Python has a HTMLParser module in the standard library:
>> 
>> http://www.python.org/doc/lib/module-HTMLParser.html
>> http://www.python.org/doc/lib/htmlparser-example.html
>> 
>> It looks complicated if you are new to all this, but it's fairly simple
>> really.  Using it is much better than dealing with HTML syntax yourself.
>> 
>> A small example:
>> 
>> --------------------------------------------------
>> from HTMLParser import HTMLParser
>> 
>> class MyHTMLParser(HTMLParser):
>>     
>>         print "Encountered the beginning of a %s tag" % tag
>>     def handle_endtag(self, tag):
>>         print "Encountered the end of a %s tag" % tag
>> 
>> my_parser=MyHTMLParser()
>> 
>> html_data = """
>> <html>
>>   <head>
>>     <title>hi</title>
>>   </head>
>>   <body> hi </body>
>> </html>
>> """
>> 
>> my_parser.feed(html_data)
>> --------------------------------------------------
>> 
>> will produce the result:
>> Encountered the beginning of a html tag
>> Encountered the beginning of a head tag
>> Encountered the beginning of a title tag
>> Encountered the end of a title tag
>> Encountered the end of a head tag
>> Encountered the beginning of a body tag
>> Encountered the end of a body tag
>> Encountered the end of a html tag
>> 
>> You'll be able to figure out the rest using the
>> documentation and some experimentation.
>> 
>> HTH,
>> -param
> Thank you!! that was just the kind of help I was
> looking for.
> 
> Best regards
> 
> Rigga
I have just tried your example exacly as you typed
it (copy and paste) and I get a syntax error everytime 
I run it, it always fails at the line starting:

def handle_starttag(self, tag, attrs):

And the error message shown in the command line is:

DeprecationWarning: Non-ASCII character '\xa0'

What does this mean?

Many thanks

R





More information about the Python-list mailing list