Pyexpat and iso-8859-1
C.Laurence Gonsalves
clgonsal at keeshah.penguinpowered.com
Tue Jun 20 04:50:29 EDT 2000
On Tue, 13 Jun 2000 07:32:07 -0500, Marc Jeurissen <mjeuris at lib.ua.ac.be> wrote:
> When I parse a XML-file containing the declaration '<?xml version="1.0"
> encoding="iso-8859-1"?>' with Pyexpat, every iso-latin1 character is
> being replaced by 2 new characters, the first of wich is nearly always
> #195 (Ã), the second one has a decimal value of 64 less than the
> original character value.
>
> Some examples:
>
> #233 (é) becomes #195 + #169 (©)
> #231 (ç) becomes #195 + #167 (§)
> #239 (ï) becomes #195 + #175 (¯)
>
> Anyone knows what to do about this?
I don't know what to do about it, but it looks like the characters are
getting UTF8 encoded. I have no experience with UTF8 in Python, but a
bit of searching on www.python.org turned up this:
http://w1.132.telia.com/~u13208596/unicode.htm
HTH...
--
C. Laurence Gonsalves "Any sufficiently advanced
clgonsal at kami.com technology is indistinguishable
http://www.cryogen.com/~clgonsal/ from magic." -- Arthur C. Clarke
More information about the Python-list
mailing list