[Expat-discuss] (no subject)

Karl Waclawek karl at waclawek.net
Mon Jun 7 12:15:13 EDT 2004


----- Original Message ----- 
From: "Mark" <11mjazbdg02 at sneakemail.com>
To: <expat-discuss at libexpat.org>
Sent: Monday, June 07, 2004 12:07 PM


> Hi,
> 
> 
> 
> [expat 1.95.6]
> 
> 
> 
> I have an XML file which uses iso-8859-1 encoding. Therefore the
> 
> file begins with:
> 
>    <?xml version="1.0" encoding="iso-8859-1"?>
> 
> 
> 
> In the file I have some character data which includes an accented
> 
> character (u umlaut - byte 0xFC).  When I parse the file
> 
> the charater code gets translated (by expat) to two bytes (0xC3, 0xBC). I can
> 
> see this in my CharacterDataHandler function.
> 
> 
> 
> What I am doing wrong?  Do I have to preprocess the file to
> 
> change this to a character reference?

You are doing nothing wrong.
Expat processes your encoding, but does not report the
data to your handlers in the same encoding, but rather
in UTF-8 or UTF-16 Unicode. This is standard behaviour
for an XML parser.

Karl




More information about the Expat-discuss mailing list