Convert from unicode chars to HTML entities

Mon Jan 29 00:13:05 EST 2007

On Sun, 28 Jan 2007 23:41:19 -0500, Leif K-Brooks wrote:

>  >>> s = u"© and many more..."
>  >>> s.encode('ascii', 'xmlcharrefreplace')
> '© and many more...'

Wow. That's short and to the point. I like it.

A few issues:

(1) It doesn't seem to be reversible:

>>> '© and many more...'.decode('latin-1')
u'© and many more...'

What should I do instead?

(2) Are XML entities guaranteed to be the same as HTML entities?

(3) Is there a way to find out at runtime what encoders/decoders/error
handlers are available, and what they do? 

Thanks,

-- 
Steven D'Aprano