Problem: PyXML 0.7 PrettyPrinting HTML, latin-1 in comments.
Syver Enstad
syver-en+usenet at online.no
Thu Jan 3 15:39:32 EST 2002
Running this code :
reader = dom.ext.reader.HtmlLib.Reader()
domObject = reader.fromUri(
os.path.expanduser('~/kode/pythonscript/index.html'))
htmlDom = dom.ext.StripHtml(domObject)
dom.ext.PrettyPrint(domObject)
On this file:
<html>
<body>
<a href="http://www.wholenote.com/member/mymusic.asp">Min side på whole note</a>
<!-- Bare en test for se om alt er som det skal -->
</body>
</html>
Works fine.
But on this file:
<html>
<body>
<a href="http://www.wholenote.com/member/mymusic.asp">Min side på whole note</a>
<!-- Bare en test for å se om alt er som det skal -->
</body>
</html>
It says:
File "D:\devtools\Python21\_xmlplus\dom\ext\Printer.py", line 356, in visitComment
self._write('<!--%s-->' % (node.data))
File "D:\devtools\Python21\_xmlplus\dom\ext\Printer.py", line 146, in _write
obj = utf8_to_code(text, self.encoding)
File "D:\devtools\Python21\_xmlplus\dom\ext\Printer.py", line 45, in utf8_to_code
text = unicode(text, "utf-8")
UnicodeError: UTF-8 decoding error: invalid data
Notice that the å character that trips up the call to PrettyPrint also
exists in the a tag without causing trouble. And that in the printed
output when it succeeds the å in the a tag is substituted for an
å escape or whatever it's called.
My question is: What should I do to successfully print html files that
are latin-1 encoded with PyXML? Is it possible?
--
Vennlig hilsen
Syver Enstad
More information about the Python-list
mailing list