[XML-SIG] Errors when using PrettyPrint Class (xml.dom.ext) and latin-1 characters (iso-8859-1) ...

Michel Charest mcharest at sogetel.net
Mon Dec 12 16:51:11 CET 2005


Hi,

* I seem to be having a problem when using the PrettyPrint class
(from PyXML's xml.dom.ext) to generate an XML document using latin-1
characters (specifically, french text accents)?

PYTHON CODE USED:
================
# -*- coding: iso-8859-1 -*-
# XML Generation example
from xml.dom import implementation
from xml.dom.ext import PrettyPrint
import StringIO

# Create an XML document:
doc  = implementation.createDocument(None, 'cases', None)
root = doc.documentElement
celem = doc.createElement('case')
pelem = doc.createElement('problem')
felem = doc.createElement('feature')
felem.appendChild(doc.createTextNode( "élève" ))                       #
Method1 (see ERROR 1)
#felem.appendChild(doc.createTextNode( unicode("élève", 'latin-1') ))  #
Method2 (see ERROR 2)
felem.setAttribute("fid", "1")
pelem.appendChild(felem)
celem.appendChild(pelem)
root.appendChild(celem)
root.setAttribute("date", "jan 01 2005")

# Print generated XML document:
xml_str = StringIO.StringIO()
PrettyPrint(doc, xml_str)                            # failure point ?
print xml_str.getvalue()

ERROR 1 - OBTAINED FROM ONLY USING (-*- coding: iso-8859-1 -*- as a script
header):
============================================================================
======
---------- Capture Output ----------
> "C:\Program Files\Python24\python.exe" genxml.py
Traceback (most recent call last):
  File "genxml.py", line 26, in ?
    PrettyPrint(doc, xml_str)                            # failure point ?
  File "C:\Program
Files\Python24\lib\site-packages\_xmlplus\dom\ext\__init__.py", line 81, in
PrettyPrint
    Printer.PrintWalker(visitor, root).run()
  File "C:\Program
Files\Python24\lib\site-packages\_xmlplus\dom\ext\Printer.py", line 385, in
run
    return self.step()
  File "C:\Program
Files\Python24\lib\site-packages\_xmlplus\dom\ext\Printer.py", line 381, in
step
    self.visitor.visit(self.start_node)
  File "C:\Program
Files\Python24\lib\site-packages\_xmlplus\dom\ext\Printer.py", line 185, in
visit
    return self.visitDocument(node)
  File "C:\Program
Files\Python24\lib\site-packages\_xmlplus\dom\ext\Printer.py", line 231, in
visitDocument
    self.visitNodeList(node.childNodes, exclude=node.doctype)
  File "C:\Program
Files\Python24\lib\site-packages\_xmlplus\dom\ext\Printer.py", line 201, in
visitNodeList
    curr is not exclude and self.visit(curr)
  File "C:\Program
Files\Python24\lib\site-packages\_xmlplus\dom\ext\Printer.py", line 161, in
visit
    return self.visitElement(node)
  File "C:\Program
Files\Python24\lib\site-packages\_xmlplus\dom\ext\Printer.py", line 270, in
visitElement
    self.visitNodeList(node.childNodes)
  File "C:\Program
Files\Python24\lib\site-packages\_xmlplus\dom\ext\Printer.py", line 201, in
visitNodeList
    curr is not exclude and self.visit(curr)
  File "C:\Program
Files\Python24\lib\site-packages\_xmlplus\dom\ext\Printer.py", line 161, in
visit
    return self.visitElement(node)
  File "C:\Program
Files\Python24\lib\site-packages\_xmlplus\dom\ext\Printer.py", line 270, in
visitElement
    self.visitNodeList(node.childNodes)
  File "C:\Program
Files\Python24\lib\site-packages\_xmlplus\dom\ext\Printer.py", line 201, in
visitNodeList
    curr is not exclude and self.visit(curr)
  File "C:\Program
Files\Python24\lib\site-packages\_xmlplus\dom\ext\Printer.py", line 161, in
visit
    return self.visitElement(node)
  File "C:\Program
Files\Python24\lib\site-packages\_xmlplus\dom\ext\Printer.py", line 270, in
visitElement
    self.visitNodeList(node.childNodes)
  File "C:\Program
Files\Python24\lib\site-packages\_xmlplus\dom\ext\Printer.py", line 201, in
visitNodeList
    curr is not exclude and self.visit(curr)
  File "C:\Program
Files\Python24\lib\site-packages\_xmlplus\dom\ext\Printer.py", line 161, in
visit
    return self.visitElement(node)
  File "C:\Program
Files\Python24\lib\site-packages\_xmlplus\dom\ext\Printer.py", line 270, in
visitElement
    self.visitNodeList(node.childNodes)
  File "C:\Program
Files\Python24\lib\site-packages\_xmlplus\dom\ext\Printer.py", line 201, in
visitNodeList
    curr is not exclude and self.visit(curr)
  File "C:\Program
Files\Python24\lib\site-packages\_xmlplus\dom\ext\Printer.py", line 167, in
visit
    return self.visitText(node)
  File "C:\Program
Files\Python24\lib\site-packages\_xmlplus\dom\ext\Printer.py", line 293, in
visitText
    text = TranslateCdata(text, self.encoding)
  File "C:\Program
Files\Python24\lib\site-packages\_xmlplus\dom\ext\Printer.py", line 118, in
TranslateCdata
    new_string = charsetHandler(new_string, encoding)
  File "C:\Program
Files\Python24\lib\site-packages\_xmlplus\dom\ext\Printer.py", line 44, in
utf8_to_code
    text = unicode(text, "utf-8")
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 0-2: invalid
data
> Terminated with exit code 1.

ERROR 2 - WHEN EXPLICITLY ENCODING MY STRING AS UNICODE:
========================================================
---------- Capture Output ----------
> "C:\Program Files\Python24\python.exe" genxml.py
<?xml version='1.0' encoding='UTF-8'?>
<cases date='jan 01 2005'>
  <case>
    <problem>
      <feature fid='1'>éleve</feature>
    </problem>
  </case>
</cases>
> Terminated with exit code 0.

COMMENT: As can be seen, when using Method1 (default encoding with
iso8859-1, I get
a UnicodeDecodeError. And, when using Metho2, explicitely encoding using
using
unicode("élève", 'latin-1'), the PrettyPrint class does not raise an
exception, but
it garbles (does not correctly interpret) my latin-1 string (i.e. élève).

EXTRA DETAILS:
==============
* Running on Windows XP (sp2)
* Python 2.4.2
* PyXML 0.8.4
* 4Suite 1.0b1
* I have tried many other encoding formats such as utf8, utf-16, utf16-le,
etc. with no luck !

Any comments or suggestions would be most appreciated.

Regards,
Michel



More information about the XML-SIG mailing list