Pyexpat and iso-8859-1

C.Laurence Gonsalves clgonsal at keeshah.penguinpowered.com
Tue Jun 20 04:50:29 EDT 2000


On Tue, 13 Jun 2000 07:32:07 -0500, Marc Jeurissen <mjeuris at lib.ua.ac.be> wrote:
> When I parse a XML-file containing the declaration '<?xml version="1.0"
> encoding="iso-8859-1"?>' with Pyexpat, every iso-latin1 character is
> being replaced by 2 new characters, the first of wich is nearly always
> #195 (Ã), the second one has a decimal value of 64 less than the
> original character value.
> 
> Some examples:
> 
> #233 (é) becomes #195 + #169 (©)
> #231 (ç) becomes #195 + #167 (§)
> #239 (ï) becomes #195 + #175 (¯)
> 
> Anyone knows what to do about this?

I don't know what to do about it, but it looks like the characters are
getting UTF8 encoded. I have no experience with UTF8 in Python, but a
bit of searching on www.python.org turned up this:

    http://w1.132.telia.com/~u13208596/unicode.htm

HTH...

-- 
  C. Laurence Gonsalves                "Any sufficiently advanced
  clgonsal at kami.com                     technology is indistinguishable
  http://www.cryogen.com/~clgonsal/     from magic." -- Arthur C. Clarke




More information about the Python-list mailing list