[Expat-discuss] (no subject)
Karl Waclawek
karl at waclawek.net
Mon Jun 7 12:15:13 EDT 2004
----- Original Message -----
From: "Mark" <11mjazbdg02 at sneakemail.com>
To: <expat-discuss at libexpat.org>
Sent: Monday, June 07, 2004 12:07 PM
> Hi,
>
>
>
> [expat 1.95.6]
>
>
>
> I have an XML file which uses iso-8859-1 encoding. Therefore the
>
> file begins with:
>
> <?xml version="1.0" encoding="iso-8859-1"?>
>
>
>
> In the file I have some character data which includes an accented
>
> character (u umlaut - byte 0xFC). When I parse the file
>
> the charater code gets translated (by expat) to two bytes (0xC3, 0xBC). I can
>
> see this in my CharacterDataHandler function.
>
>
>
> What I am doing wrong? Do I have to preprocess the file to
>
> change this to a character reference?
You are doing nothing wrong.
Expat processes your encoding, but does not report the
data to your handlers in the same encoding, but rather
in UTF-8 or UTF-16 Unicode. This is standard behaviour
for an XML parser.
Karl
More information about the Expat-discuss
mailing list