[XML-SIG] Processing xml files with ISO 8859-1 chars
Dan Gunter
dkgunter@lbl.gov
Wed, 07 Nov 2001 07:02:35 -0800
The simple answer is that the XML parser is illiterate. Since there are=20
no bit patterns that are illegal in UTF-8, I don't see how the parser=20
could know that the chosen encoding produced, from the user's=20
perspective, garbage. The pretty-printer, on the other hand, knows the=20
difference between printable and non-printable characters and can thus=20
complain.
Dan
Thomas B. Passin wrote:
> It seems that this xml file should caused an exception, since it is not
> well-formed: the actual encoding does not match the presumed encoding
> (namely, utf-8). The fact that the parse partially succeeded is distur=
bing.
>=20
> I tried this example myself. I am running pyxml 6.6 on Windows2000. I=
did
> get an exception, but it was from the pretty-printer, not the parser.
> Adding an xml declaration declaring the actual iso-8859-1 encoding did =
in
> fact allow the program to complete properly, as expected.
>=20
> Why didn't the parser complain?
>=20
> Cheers,
>=20
> Tom P
>=20
>=20
> [Rodrigo Senra]
>=20
>=20
>> I don't know if I stepped in a bug or it is just my newbieness ;o)
>> Trying to parse the file:
>>
>>------------------- pau.xml -------------------
>><note>
>> <assunto>
>> This line is ok.
>> This line has characters ISO-8859-1 with accents: Houve mudan=E7as =
nos
>>pre=E7os?
>> Linha ok.
>> </assunto>
>></note>
>>------------------ end of file pau.xml --------
>>
>>with the script:
>>
>>------------------ file teste.py ----------------------------
>>from xml.dom.ext.reader import Sax2
>>from xml.dom.ext import PrettyPrint
>>
>>doc =3D Sax2.FromXmlStream(open('pau.xml'))
>>PrettyPrint(doc,encoding=3D'iso-8859-1')
>>-------------------- end of teste.py script ------------
>>
>>produces:
>>
>>----------- stdout trace -------------
>><?xml version=3D'1.0' encoding=3D'iso-8859-1'?>
>><!DOCTYPE note>
>><note>
>> <assunto>
>> This line is ok.
>>
>> Linha ok.
>> </assunto>
>></note>
>>----------- end of trace -------------
>>
>>Am I doing something obviously wrong ? Should I try another parser ?
>>
>=20
>=20
>=20
> _______________________________________________
> XML-SIG maillist - XML-SIG@python.org
> http://mail.python.org/mailman/listinfo/xml-sig
>=20