Sax2 encoding

Alexandre Fayolle alf at logilab.fr
Fri Aug 30 07:44:20 EDT 2002


Dans l'article <mailman.1030702904.8805.python-list at python.org>, 
Juan M. Casillas a écrit :
> 
> Hello folks!
> 
> 
> I have an xml document that only begins with
> 
><?xml version="1.0"?>
> [...]
> 
> 
> That is, without no info about the encoding. This document has special
> characters encoded in ISO-8859-1 format (spanish characters just like
> á, or ñ). 

Then your document is not well formed XML, and you will have big trouble
parsing it. It should begin with 
<?xml verions="1.0" encoding="iso-8859-1"?>

If you can't change this yourself, you should ask the author to do it.
And if he doesn't want, you should convert it to utf-8 using python's
codec module before parsing it.
 
> and poking arround the file, I found a 'á' character at this position.
> So my question is... how can I set the default encoding for the sax2
> reader so the XML parser works for me ?

The default encoding is UTF-8, because this is what the XML
specification mandates. You cannot change it. 

-- 
Alexandre Fayolle
-- 
LOGILAB, Paris (France).
http://www.logilab.com   http://www.logilab.fr  http://www.logilab.org
Narval, the first software agent available as free software (GPL).



More information about the Python-list mailing list