XML (XHTML) character entities and PxXml
andrew cooke
andrew at acooke.org
Wed May 8 10:37:47 EDT 2002
martin at v.loewis.de (Martin v. Loewis) wrote in message news:<m3it5z6w72.fsf at mira.informatik.hu-berlin.de>...
> andrew at acooke.org (andrew cooke) writes:
[...]
> > i tried this. at least, i tried implementing the interface and using
> > a method that i thought would set the entityresolver on the parser,
> > but the method on the class was never called (sorry for the lack of
> > details - it was at work - i believe i used setEntityResolver and
> > implemented the single method in EntityResolver as a simple "print",
> > but nothing printed).
>
> That is supposed to work; you'll need to provide details to analyse
> what went wrong.
Hi,
Now at work, here are the details:
test.py:
from xml.sax import sax2exts
from xml.dom.ext import PrettyPrint
from xml.dom.ext.reader.Sax2 import FromXmlFile, Reader
import sys
class Resolver:
def resolveEntity(self, publicId, systemId):
print "resolve",publicId,systemId
file = "../xhtml/index.xhtml"
sys.stdout.writelines(open(file, "r").readlines())
#print file
#PrettyPrint(FromXmlFile(file))
parser = sax2exts.XMLParserFactory.make_parser()
parser.setEntityResolver(Resolver())
reader = Reader(parser=parser)
PrettyPrint(reader.fromStream(open(file)))
and the output (note no printing of entities in the second run, which
is missing the inverted exclamation mark and accented "o"):
F:\home\Andrew\multi\src\python>python test.py
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta name="generator"
content="HTML Tidy for Cygwin (vers 1st April 2002), see www.w3.org"
/>
<link type="text/css" rel="stylesheet" href="basic.css" />
<title>Index</title>
</head>
<body>
<h1>¡Hola!</h1>
<a href="init">initialisación</a>
</body>
</html>
<?xml version='1.0' encoding='UTF-8'?>
<!DOCTYPE html>
<html xmlns='http://www.w3.org/1999/xhtml'>
<head>
<meta content='HTML Tidy for Cygwin (vers 1st April 2002), see
www.w3.org' n
ame='generator'/>
<link href='basic.css' rel='stylesheet' type='text/css'/>
<title>Index</title>
</head>
<body>
<h1>Hola!</h1>
<a href='init'>initialisacin</a>
</body>
</html>
More information about the Python-list
mailing list