XML: minidom toxml() does not work for non English files! :-(
M.-A. Lemburg
mal at lemburg.com
Tue May 7 05:33:59 EDT 2002
Trent Mick wrote:
>
> Micheal, Marc-Andre,
>
> Perhaps you could help me shed some light on this. There are two issues
> that I see:
> 1. The actual problem that Jaros³aw reported.
> > from xml.dom import minidom
> > xmldoc = minidom.parse('myfile.xml')
> > print xmldoc.toxml()
> >
> > It works for 7-bit text fine. But the problem is it works ONLY for
> > pure ASCII text. :-( If I try to use any of non English characters,
> > Python raise an exception:
> >
> > UnicodeError: ASCII encoding error: ordinal not in range(128)
>
> Jaros³aw mentions that the problem goes away if he replaces
> ActivePython 2.2.1's StringIO.py with the one from the PythonLabs
> distro. That would be fine (a bug in ActivePython) except that
> ActivePython has the more *recent* StringIO.py. So is Jaros³aw
> misusing StringIO.py or is this StringIO.py checkin incorrect or
> am I confused:
> MAL's checkin on the trunk:
> http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/python/python/dist/src/Lib/StringIO.py.diff?r1=1.19&r2=1.20
> Micheal's back port to Python 2.2:
> http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/python/python/dist/src/Lib/StringIO.py.diff?r1=1.19&r2=1.19.12.1
I think you have to provide more information here, e.g.
the traceback and a dump of the local variables.
The patch only reenables handling Unicode with StringIO,
so I can't see why this fails. Could be that minidom mixes
Unicode and strings and that this causes the UnicodeError
to trigger due to implicit coercion going on inside StringIO.
> 2. It looks to me like Python 2.2.1 does *not* include the
> StringIO.py that is part of the 'r221' Python CVS tag. Am I
> wrong?
>
> Any insight would be appreciated.
>
> Thanks,
> Trent
>
> Further information:
>
> - StringIO CVS log:
> http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/python/python/dist/src/Lib/StringIO.py
>
> - diff of Python Labs' 2.2.1 StringIO.py with StringIO.py in CVS at the
> 'r221' tag:
>
> C:\>diff -u C:\PythonLabs22\Lib\StringIO.py D:\cvs\python-r221\dist\src\Lib\StringIO.py
> --- C:\PythonLabs22\Lib\StringIO.py Mon Sep 24 13:34:52 2001
> +++ D:\cvs\python-r221\dist\src\Lib\StringIO.py Mon Mar 18 05:31:30 2002
>
> @@ -28,7 +28,7 @@
> bytes that occupy space in the buffer.
> - There's a simple test set (see end of this file).
> """
> -
> +import types
> try:
> from errno import EINVAL
> except ImportError:
> @@ -38,8 +38,10 @@
>
> class StringIO:
> def __init__(self, buf = ''):
> - # Force self.buf to be a string
> - self.buf = str(buf)
> + # Force self.buf to be a string or unicode
> + if type(buf) not in types.StringTypes:
> + buf = str(buf)
> + self.buf = buf
> self.len = len(buf)
> self.buflist = []
> self.pos = 0
> @@ -135,8 +137,9 @@
> if self.closed:
> raise ValueError, "I/O operation on closed file"
> if not s: return
> - # Force s to be a string
> - s = str(s)
> + # Force s to be a string or unicode
> + if type(s) not in types.StringTypes:
> + s = str(s)
> if self.pos > self.len:
> self.buflist.append('\0'*(self.pos - self.len))
> self.len = self.pos
>
> [Jaros³aw Zabie³³o wrote]
> > I have a small code:
> >
> > from xml.dom import minidom
> > xmldoc = minidom.parse('myfile.xml')
> > print xmldoc.toxml()
> >
> > It works for 7-bit text fine. But the problem is it works ONLY for
> > pure ASCII text. :-( If I try to use any of non English characters,
> > Python raise an exception:
> >
> > UnicodeError: ASCII encoding error: ordinal not in range(128)
> >
> > It does NOT work even on utf-8 xml files with any character outside
> > 7-bit ASCII character set. It is strange, because utf-8 should be
> > correctly parsed by all xml tools.
> >
> > Is it mean toxml() or toprettyxml() methods of minidom are useless for
> > non English strings? I need them to cut one big xml file into smaller
> > pieces and write them into several files.
>
> [Jaros³aw Zabie³³o wrote]
> > I found a solution.
> >
> > The last release of ActivePython has bad StringIO.py file! I compare
> > it with Python 2.2.1rc from www.python.org _which works fine_.
>
> --
> Trent Mick
> TrentM at ActiveState.com
--
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting: http://www.egenix.com/
Python Software: http://www.egenix.com/files/python/
More information about the Python-list
mailing list