[XML-SIG] I am stuck: 4DOM / utf-8
Horst Eyermann
Horst Eyermann <horst@freedict.de>
Wed, 08 Aug 2001 10:21:51 -0000 (
Hello Martin,
thanks for you fast reply.
I tried it again, and now it works, without me knowing why :( -
The same code, the same input file, but I must have changed something
somewhere...
But anyway, that's the way I read in the data:
from xml.dom import Node
from xml.dom import ext
from xml.dom.ext.reader import PyExpat
from xml import xpath
from xml.xpath import Util
from Ft.Lib.pDomlette import PyExpatReader
from StringIO import StringIO
self.reader = PyExpatReader()
self.doc = self.reader.fromUri(fileName)
Util.IndexDocument(self.doc)
self.result = xpath.Evaluate('//entry', contextNode=self.doc)
To be honest, I just thought that strings would just be unicode strings. I will
do some reading on unicode objects.
Thanks
Horst
On 07-Aug-01 Martin v. Loewis wrote:
>> I converted a XML file encoded in utf-8 into a DOM structure (PyXML-0.6.5).
>
> How exactly did you do that? Did you use one of the PyXML parsers? If
> so, which one?
>
>> Then I try to split the document into smaller subparts and store the
>> parts into a database, and display them with tkinter.
>
> I suppose you use DOM manipulation functions for that? Did you pass
> strings to those functions, or Unicode objects? If strings, did they
> contain non-ASCII characters? If so, it would explain the problem: You
> always must put Unicode objects into DOM trees; the only exception are
> plain ASCII strings (i.e. no accented or otherwise funny characters).
>
>> For extraction of XML from the DOM, the best function I found was
>> PrettyPrint, which unfortunately does not support direct assining to
>> a string. So I followed the examples utilizing the StringIO
>> library. However, every time I try to access the stream, I get an
>> error (see below). What should I do?
>
> If you cannot figure out the problem, it would be helpful if you'd
> change your code to
>
> stream = StringIO()
> ext.PrettyPrint(value, stream=stream)
> print repr(stream.buflist)
> stream.seek(0)
> text = load(stream)
> stream.close()
>
> and post the contents of the buflist, together with the XML file that
> was the input. If it is a large file, or if you don't want to post it
> to the general public, I'd appreciate to get a private message.
>
> I'm very must surprised that Unicode objects end up in the StringIO,
> but that may be a bug in PyXML - in principle, everything ought to be
> UTF-8 encoded by PrettyPrint.
>
> Regards,
> Martin
>
> _______________________________________________
> XML-SIG maillist - XML-SIG@python.org
> http://mail.python.org/mailman/listinfo/xml-sig
Horst@freedict.de
Horst Eyermann
Germany
You need a dictionary? - visit http://www.freedict.de
for free (GPL) dictionaries (unix; windows work in progress)
For windows, visit http://www.freedict.de/wbuch
A article (in German) about dictionary efforts on the net
http://www.heise.de/tp/deutsch/inhalt/on/5927/1.html