[Python-Dev] forwarded message from aahz@panix.com

Thu, 05 Oct 2000 16:07:29 -0500

> Yeah, I missed that earlier.  But after thinking some more, there are a
> fair number of browser-like bits of software that fail to render many of
> the special characters correctly (e.g. trademark).  This is frequently
> due to character set issues; entities almost always render correctly,
> though.  Therefore a general translation routine is probably handy.

But character set issues also make it impossible to provide such a
translation routine: the current party line is that the encoding of
8-bit strings is unknown and that only ASCII can be assumed.

> cgi.escape() only handles "&", "<", ">".  I'm not sure whether cgi.escape
> ought to be expanded to handle all characters or a new routine should be
> added.  Martin van Loewis suggested xml.sax.saxutils.escape(), but I
> have zero familiarity with XML and am waiting for 2.0final.  Perhaps
> this should be taken off-line?

xml.sax.saxutils.escape() is a generalization of cgi.escape() -- read
the source (Lib/xml/sax/saxutils.py).  It allows you to specify
additional things to be replaced by entities by passing in a
dictionary mapping chars (or strings) to entities.

If you know that you are dealing with Latin-1, you could use the table
in htmlentitydefs.py to construct a table.

--Guido van Rossum (home page: http://www.python.org/~guido/)