minidom and unicode errors

Abhimanyu Seth abhimanyu.seth at gmail.com
Tue Mar 7 01:50:37 EST 2006


On 3/7/06, Fredrik Lundh <fredrik at pythonware.com> wrote:
>
> Abhimanyu Seth wrote:
>
> > > I have the following line in my xml file:
> > > <target>Exception beim Löschen des Audit-Moduls aufgetreten. Exception
> > Stack
> > > lautet: %1.</target>
> > > ExpatError: not well-formed (invalid token): line 8, column 27
>
> > I've specified utf-8 in the xml header
> > <?xml version="1.0" encoding="utf-8"?>
>
> are you sure you're using utf-8 in the XML file?  the ö you pasted into
> your mail is an iso-8859-1 code, not an utf-8 code.
>
> > Anyway,
> > >> f = codecs.open ("c:/test.txt", "r", "latin-1")
> > >> dom = minidom.parseString (codecs.encode (f.read(), "utf-8"))
> > works.
>
> which means that you've labelled the file as utf-8, but that it actually
> contains iso-8859-1.  fixing the file should fix this.
>
> </F>
>
>
>
>
>
> --
> http://mail.python.org/mailman/listinfo/python-list
>
>
Sorry, my mistake. The file was not saved as utf-8. Saving it as utf-8
solves my problems.
>> f = codecs.open ("c:/test.txt", "r", "utf-8")
>> dom = minidom.parseString (codecs.encode (f.read(), "utf-8"))

However, I still need to encode the string returned by f.read () before
passing it to parseString. Otherwise I get an exception.

Thanks, anyway for all the help.

--
Regards,
Abhimanyu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20060307/2a2ed2f4/attachment.html>


More information about the Python-list mailing list