Remove spaces and line wraps from html?

RiGGa rigga at hasnomail.com
Sat Jun 19 03:18:31 EDT 2004


RiGGa wrote:

> RiGGa wrote:
> 
>> Paramjit Oberoi wrote:
>> 
>>>>>
>>
>
http://groups.google.com/groups?q=HTMLPrinter&hl=en&lr=&ie=UTF-8&c2coff=1&selm=pan.2004.03.27.22.05.55.38448240hotmail.com&rnum=1
>>>>> 
>>>>> (or search c.l.p for "HTMLPrinter")
>>>>
>>>> Thanks, I forgot to mention I am new to Python so I dont yet know how
>>>> to use that example :(
>>> 
>>> Python has a HTMLParser module in the standard library:
>>> 
>>> http://www.python.org/doc/lib/module-HTMLParser.html
>>> http://www.python.org/doc/lib/htmlparser-example.html
>>> 
>>> It looks complicated if you are new to all this, but it's fairly simple
>>> really.  Using it is much better than dealing with HTML syntax yourself.
>>> 
>>> A small example:
>>> 
>>> --------------------------------------------------
>>> from HTMLParser import HTMLParser
>>> 
>>> class MyHTMLParser(HTMLParser):
>>>     
>>>         print "Encountered the beginning of a %s tag" % tag
>>>     def handle_endtag(self, tag):
>>>         print "Encountered the end of a %s tag" % tag
>>> 
>>> my_parser=MyHTMLParser()
>>> 
>>> html_data = """
>>> <html>
>>>   <head>
>>>     <title>hi</title>
>>>   </head>
>>>   <body> hi </body>
>>> </html>
>>> """
>>> 
>>> my_parser.feed(html_data)
>>> --------------------------------------------------
>>> 
>>> will produce the result:
>>> Encountered the beginning of a html tag
>>> Encountered the beginning of a head tag
>>> Encountered the beginning of a title tag
>>> Encountered the end of a title tag
>>> Encountered the end of a head tag
>>> Encountered the beginning of a body tag
>>> Encountered the end of a body tag
>>> Encountered the end of a html tag
>>> 
>>> You'll be able to figure out the rest using the
>>> documentation and some experimentation.
>>> 
>>> HTH,
>>> -param
>> Thank you!! that was just the kind of help I was
>> looking for.
>> 
>> Best regards
>> 
>> Rigga
> I have just tried your example exacly as you typed
> it (copy and paste) and I get a syntax error everytime
> I run it, it always fails at the line starting:
> 
> def handle_starttag(self, tag, attrs):
> 
> And the error message shown in the command line is:
> 
> DeprecationWarning: Non-ASCII character '\xa0'
> 
> What does this mean?
> 
> Many thanks
> 
> R
Ignore that, I retyped it manually and it now works, must have been a hidden
chatracter that my IDE didnt like.

Thanks again for your help, no doubt I will post back later with more
questions :)

Thanks
R



More information about the Python-list mailing list