[Python-Dev] forwarded message from aahz@panix.com
Guido van Rossum
guido@python.org
Thu, 05 Oct 2000 16:07:29 -0500
> Yeah, I missed that earlier. But after thinking some more, there are a
> fair number of browser-like bits of software that fail to render many of
> the special characters correctly (e.g. trademark). This is frequently
> due to character set issues; entities almost always render correctly,
> though. Therefore a general translation routine is probably handy.
But character set issues also make it impossible to provide such a
translation routine: the current party line is that the encoding of
8-bit strings is unknown and that only ASCII can be assumed.
> cgi.escape() only handles "&", "<", ">". I'm not sure whether cgi.escape
> ought to be expanded to handle all characters or a new routine should be
> added. Martin van Loewis suggested xml.sax.saxutils.escape(), but I
> have zero familiarity with XML and am waiting for 2.0final. Perhaps
> this should be taken off-line?
xml.sax.saxutils.escape() is a generalization of cgi.escape() -- read
the source (Lib/xml/sax/saxutils.py). It allows you to specify
additional things to be replaced by entities by passing in a
dictionary mapping chars (or strings) to entities.
If you know that you are dealing with Latin-1, you could use the table
in htmlentitydefs.py to construct a table.
--Guido van Rossum (home page: http://www.python.org/~guido/)