Problem: PyXML 0.7 PrettyPrinting HTML, latin-1 in comments.

Syver Enstad syver-en+usenet at online.no
Thu Jan 3 15:39:32 EST 2002


Running this code :
    reader = dom.ext.reader.HtmlLib.Reader()
    domObject = reader.fromUri(
      os.path.expanduser('~/kode/pythonscript/index.html'))
    htmlDom = dom.ext.StripHtml(domObject)
    dom.ext.PrettyPrint(domObject)

On this file:
<html>
<body>

<a href="http://www.wholenote.com/member/mymusic.asp">Min side på whole note</a>
<!--  Bare en test for  se om alt er som det skal -->
</body>
</html>

Works fine.

But on this file:
<html>
<body>

<a href="http://www.wholenote.com/member/mymusic.asp">Min side på whole note</a>
<!--  Bare en test for å se om alt er som det skal -->
</body>
</html>

It says:

  File "D:\devtools\Python21\_xmlplus\dom\ext\Printer.py", line 356, in visitComment
    self._write('<!--%s-->' % (node.data))
  File "D:\devtools\Python21\_xmlplus\dom\ext\Printer.py", line 146, in _write
    obj = utf8_to_code(text, self.encoding)
  File "D:\devtools\Python21\_xmlplus\dom\ext\Printer.py", line 45, in utf8_to_code
    text = unicode(text, "utf-8")
UnicodeError: UTF-8 decoding error: invalid data

Notice that the å character that trips up the call to PrettyPrint also
exists in the a tag without causing trouble. And that in the printed
output when it succeeds the å in the a tag is substituted for an
å escape or whatever it's called.

My question is: What should I do to successfully print html files that
are latin-1 encoded with PyXML? Is it possible?

-- 

Vennlig hilsen 

Syver Enstad



More information about the Python-list mailing list