XML (XHTML) character entities and PxXml

andrew cooke andrew at acooke.org
Wed May 8 10:37:47 EDT 2002


martin at v.loewis.de (Martin v. Loewis) wrote in message news:<m3it5z6w72.fsf at mira.informatik.hu-berlin.de>...
> andrew at acooke.org (andrew cooke) writes:
[...]
> > i tried this.  at least, i tried implementing the interface and using
> > a method that i thought would set the entityresolver on the parser,
> > but the method on the class was never called (sorry for the lack of
> > details - it was at work - i believe i used setEntityResolver and
> > implemented the single method in EntityResolver as a simple "print",
> > but nothing printed).
> 
> That is supposed to work; you'll need to provide details to analyse
> what went wrong.

Hi,

Now at work, here are the details:

test.py:
from xml.sax import sax2exts
from xml.dom.ext import PrettyPrint
from xml.dom.ext.reader.Sax2 import FromXmlFile, Reader
import sys

class Resolver:
    def resolveEntity(self, publicId, systemId):
        print "resolve",publicId,systemId

file = "../xhtml/index.xhtml"
sys.stdout.writelines(open(file, "r").readlines())
#print file
#PrettyPrint(FromXmlFile(file))

parser = sax2exts.XMLParserFactory.make_parser()
parser.setEntityResolver(Resolver())
reader = Reader(parser=parser)
PrettyPrint(reader.fromStream(open(file)))

and the output (note no printing of entities in the second run, which
is missing the inverted exclamation mark and accented "o"):

F:\home\Andrew\multi\src\python>python test.py
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta name="generator"
content="HTML Tidy for Cygwin (vers 1st April 2002), see www.w3.org"
/>
<link type="text/css" rel="stylesheet" href="basic.css" />
<title>Index</title>
</head>
<body>
<h1>¡Hola!</h1>

<a href="init">initialisación</a>
</body>
</html>

<?xml version='1.0' encoding='UTF-8'?>
<!DOCTYPE html>
<html xmlns='http://www.w3.org/1999/xhtml'>
  <head>
    <meta content='HTML Tidy for Cygwin (vers 1st April 2002), see
www.w3.org' n
ame='generator'/>
    <link href='basic.css' rel='stylesheet' type='text/css'/>
    <title>Index</title>
  </head>
  <body>
    <h1>Hola!</h1>
    <a href='init'>initialisacin</a>
  </body>
</html>



More information about the Python-list mailing list