Convert from unicode chars to HTML entities

Steven D'Aprano steve at REMOVEME.cybersource.com.au
Mon Jan 29 00:13:05 EST 2007


On Sun, 28 Jan 2007 23:41:19 -0500, Leif K-Brooks wrote:

>  >>> s = u"© and many more..."
>  >>> s.encode('ascii', 'xmlcharrefreplace')
> '© and many more...'

Wow. That's short and to the point. I like it.

A few issues:

(1) It doesn't seem to be reversible:

>>> '© and many more...'.decode('latin-1')
u'© and many more...'

What should I do instead?


(2) Are XML entities guaranteed to be the same as HTML entities?


(3) Is there a way to find out at runtime what encoders/decoders/error
handlers are available, and what they do? 


Thanks,


-- 
Steven D'Aprano 




More information about the Python-list mailing list