Sax2 encoding
Alexandre Fayolle
alf at logilab.fr
Fri Aug 30 07:44:20 EDT 2002
Dans l'article <mailman.1030702904.8805.python-list at python.org>,
Juan M. Casillas a écrit :
>
> Hello folks!
>
>
> I have an xml document that only begins with
>
><?xml version="1.0"?>
> [...]
>
>
> That is, without no info about the encoding. This document has special
> characters encoded in ISO-8859-1 format (spanish characters just like
> á, or ñ).
Then your document is not well formed XML, and you will have big trouble
parsing it. It should begin with
<?xml verions="1.0" encoding="iso-8859-1"?>
If you can't change this yourself, you should ask the author to do it.
And if he doesn't want, you should convert it to utf-8 using python's
codec module before parsing it.
> and poking arround the file, I found a 'á' character at this position.
> So my question is... how can I set the default encoding for the sax2
> reader so the XML parser works for me ?
The default encoding is UTF-8, because this is what the XML
specification mandates. You cannot change it.
--
Alexandre Fayolle
--
LOGILAB, Paris (France).
http://www.logilab.com http://www.logilab.fr http://www.logilab.org
Narval, the first software agent available as free software (GPL).
More information about the Python-list
mailing list