How do I convert escaped HTML into a string?

Just Another Victim of the Ambient Morality ihatespam at hotmail.com
Sat Nov 24 00:42:06 EST 2007


    I've done a google search on this but, amazingly, I'm the first guy to 
ever need this!  Everyone else seems to need the reverse of this.  Actually, 
I did find some people who complained about this and rolled their own 
solution but I refuse to believe that Python doesn't have a built-in 
solution to what must be a very common problem.
    So, how do I convert HTML to plaintext?  Something like this:


<div>This is a string.</div>


    ...into:


This is a string.


    Actually, the ideal would be a function that takes an HTML string and 
convert it into a string that the HTML would correspond to.  For instance, 
converting:


<div>This &    that
or the other thing.</div>


    ...into:


This & that or the other thing.


    ...since HTML seems to convert any amount and type of whitespace into a 
single space (a bizarre design choice if I've ever seen one).
    Surely, Python can already do this, right?
    Thank you... 





More information about the Python-list mailing list