XML (XHTML) character entities and PxXml

andrew cooke andrew at acooke.org
Tue May 7 12:59:55 EDT 2002


Hi,

I'm processing XML (XHTML) in Python and have hit a major problem -
character entities appear to be silently dropped.  As far as I
understand the DOM docs, they should be translated into appropriate
UTF characters, but instead they seem to simply disappear.  For
example, the code:

PrettyPrint(FromXmlStream(...))

changes
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta name="generator"
content="HTML Tidy for Cygwin (vers 1st April 2002), see www.w3.org"
/>
<link type="text/css" rel="stylesheet" href="basic.css" />
<title>Index</title>
</head>
<body>
<h1>¡Hola!</h1>
<a href="init">initialisación</a>
</body>
</html>

to

<?xml version='1.0' encoding='UTF-8'?>
<!DOCTYPE html>
<html xmlns='http://www.w3.org/1999/xhtml'>
<head>
<meta content='HTML Tidy for Cygwin (vers 1st April 2002), see
www.w3.org' name='generator'/>
<link href='basic.css' type='text/css' rel='stylesheet'/>
<title>Index</title>
</head>
<multi:body>
<h1>Hola!</h1>
<a href='init'>initialisacin</a>
</multi:body>
</html>

Where the upside-down exclamation mark before "Hola!" and the accented
"o" in initialisacion have simply disappeared!.

Please help!  I'm working in Spanish, and those characters are very
important!

Thanks,
Andrew



More information about the Python-list mailing list