[XML-SIG] I am stuck: 4DOM / utf-8

Horst Eyermann Horst Eyermann <horst@freedict.de>
Wed, 08 Aug 2001 10:21:51 -0000 (


Hello Martin,
thanks for you fast reply.

I tried it again, and now it works, without me knowing why :( - 
The same code, the same input file, but I must have changed something
somewhere...

But anyway, that's the way I read in the data:
 
from xml.dom import Node
from xml.dom import ext
from xml.dom.ext.reader import PyExpat
from xml import xpath
from xml.xpath import Util
from Ft.Lib.pDomlette import PyExpatReader
 
from StringIO import StringIO


        self.reader = PyExpatReader()
        self.doc = self.reader.fromUri(fileName)
        Util.IndexDocument(self.doc)
        self.result = xpath.Evaluate('//entry', contextNode=self.doc)

 
To be honest, I just thought that strings would just be unicode strings. I will
do some reading on unicode objects.

Thanks

Horst

On 07-Aug-01 Martin v. Loewis wrote:
>> I converted a XML file encoded in utf-8 into a DOM structure (PyXML-0.6.5).
> 
> How exactly did you do that? Did you use one of the PyXML parsers? If
> so, which one?
> 
>> Then I try to split the document into smaller subparts and store the
>> parts into a database, and display them with tkinter.
> 
> I suppose you use DOM manipulation functions for that? Did you pass
> strings to those functions, or Unicode objects? If strings, did they
> contain non-ASCII characters? If so, it would explain the problem: You
> always must put Unicode objects into DOM trees; the only exception are
> plain ASCII strings (i.e. no accented or otherwise funny characters).
> 
>> For extraction of XML from the DOM, the best function I found was
>> PrettyPrint, which unfortunately does not support direct assining to
>> a string. So I followed the examples utilizing the StringIO
>> library. However, every time I try to access the stream, I get an
>> error (see below).  What should I do?
> 
> If you cannot figure out the problem, it would be helpful if you'd
> change your code to
> 
>         stream = StringIO()
>         ext.PrettyPrint(value, stream=stream)
>         print repr(stream.buflist)
>         stream.seek(0)
>         text = load(stream)
>         stream.close()
> 
> and post the contents of the buflist, together with the XML file that
> was the input. If it is a large file, or if you don't want to post it
> to the general public, I'd appreciate to get a private message.
> 
> I'm very must surprised that Unicode objects end up in the StringIO,
> but that may be a bug in PyXML - in principle, everything ought to be
> UTF-8 encoded by PrettyPrint.
> 
> Regards,
> Martin
> 
> _______________________________________________
> XML-SIG maillist  -  XML-SIG@python.org
> http://mail.python.org/mailman/listinfo/xml-sig

Horst@freedict.de
Horst Eyermann 
Germany

You need a dictionary? - visit http://www.freedict.de
for free (GPL) dictionaries (unix; windows work in progress)
For windows, visit http://www.freedict.de/wbuch

A article (in German) about dictionary efforts on the net
http://www.heise.de/tp/deutsch/inhalt/on/5927/1.html