[issue1290] xml.dom.minidom not able to handle utf-8 data

Sharmila Sivakumar report at bugs.python.org
Thu Oct 18 06:45:14 CEST 2007


Sharmila Sivakumar added the comment:

Oops Facundo, that will work.   It actually fails *
after the dom construction* when you do

mydom.firstChild.childNodes

I request you to try it again.

The prob is there is some encoding and decoding done within the parser, and
it uses the default encoding 'ascii'.  This fails for utf-8 data.

On 10/18/07, Sharmila Sivakumar <report at bugs.python.org> wrote:
>
>
> Sharmila Sivakumar added the comment:
>
> Thanks for your quick response Facundo.
>
> I'm working on Ubuntu 7.04, python 2.5.1
> Python 2.5.1 (r251:54863, May 2 2007, 16:56:35)
> [GCC 4.1.2 (Ubuntu 4.1.2-0ubuntu4)] on linux2
>
> This error occurs when the default encoding is 'ascii'.  When I change the
> default encoding to 'utf-8' it works for me too.  Is, by any chance, your
> default encoding 'utf-8'?
>
> On 10/18/07, Facundo Batista <report at bugs.python.org> wrote:
> >
> >
> > Facundo Batista added the comment:
> >
> > Downloaded the testdata.txt file, and yes, it's UTF-8:
> >
> > facundo at pomcat:~/devel$ file testdata.txt
> > testdata.txt: UTF-8 Unicode text
> >
> > But I opened it perfectly!
> >
> > Python 2.5.1 (r251:54863, May  2 2007, 16:56:35)
> > [GCC 4.1.2 (Ubuntu 4.1.2-0ubuntu4)] on linux2
> > Type "help", "copyright", "credits" or "license" for more information.
> > >>> import xml.dom.minidom as dom
> > >>> data = open('testdata.txt','r').read()
> > >>> mydom = dom.parseString(data)
> > >>> mydom
> > <xml.dom.minidom.Document instance at 0xb7c03b0c>
> > >>>
> >
> > In which platform you're working?
> >
> > And yes, you have absolute permission to fix it, patchs are always
> > welcomed!
> >
> > ----------
> > nosy: +facundobatista
> > resolution:  -> works for me
> > status: open -> closed
> >
> > __________________________________
> > Tracker <report at bugs.python.org>
> > <http://bugs.python.org/issue1290>
> > __________________________________
> >
>
> Added file: http://bugs.python.org/file8559/unnamed
>
> __________________________________
> Tracker <report at bugs.python.org>
> <http://bugs.python.org/issue1290>
> __________________________________
>

Added file: http://bugs.python.org/file8560/unnamed

__________________________________
Tracker <report at bugs.python.org>
<http://bugs.python.org/issue1290>
__________________________________
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: unnamed
Url: http://mail.python.org/pipermail/python-bugs-list/attachments/20071018/ba96f051/attachment-0001.txt 


More information about the Python-bugs-list mailing list