Is it possible to consume UTF8 XML documents using xml.dom.pulldom?

Peter Otten __peter__ at web.de
Wed Jul 30 13:23:46 EDT 2008


Paul Boddie wrote:

> On 30 Jul, 18:17, Simon Willison <si... at simonwillison.net> wrote:
>>
>> Some very useful people in #python on Freenode pointed out that my bug
>> occurs because I'm trying to display things interactively in the
>> console. Saving to a variable instead fixes the problem.
> 
> What's strange about that is how the object is represented when
> displayed:
> 
> ('CHARACTERS', <DOM Text node "Simon\u2019s XM...">)
> 
> Here, there's no attempt made to encode \u2019 as an ASCII byte
> sequence. Does the OS X version of Python do anything special with
> string representations?

I'm on Kubuntu 7.10 and see the same error as Simon. The problem is in the
minidom.CharacterData class which has the following method

    def __repr__(self):
        data = self.data
        if len(data) > 10:
            dotdotdot = "..."
        else:
            dotdotdot = ""
        return "<DOM %s node \"%s%s\">" % (
            self.__class__.__name__, data[0:10], dotdotdot)

The data attribute is a unicode instance...

Peter



More information about the Python-list mailing list