[Expat-bugs] [ expat-Bugs-624251 ] Problem parsing Latin-1 symbols

noreply@sourceforge.net noreply@sourceforge.net
Wed, 16 Oct 2002 12:23:49 -0700


Bugs item #624251, was opened at 2002-10-16 15:03
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=110127&aid=624251&group_id=10127

Category: None
Group: None
Status: Open
>Resolution: Rejected
Priority: 5
Submitted By: Nobody/Anonymous (nobody)
Assigned to: Nobody/Anonymous (nobody)
Summary: Problem parsing Latin-1 symbols

Initial Comment:
Expat 1.95-5

Getting an extra character from the parser when parsing 
extended ASCII characters (161 - 255 decimal).

The XML_CharacterDataHandler function reports 2 
characters for every 1 extended character encountered.

Below is a small XML file demonstrating the problem.
The character handler function reports two charaters (0xC2 
and 0xA9) when the xml file contains only one (0xA9).

<?xml version="1.0" encoding="ISO-8859-1" 
standalone="yes"?>
<data>©</data>

Platform: Windows 2000 exe statically linked to expat.

----------------------------------------------------------------------

>Comment By: Karl Waclawek (kwaclaw)
Date: 2002-10-16 15:23

Message:
Logged In: YES 
user_id=290026

Expat reports characters encoded as UTF-8 or UTF-16.
It does not generate ISO-8859-1 output.

What you are reporting looks lik UTF-8 encoding,
which means the character 0xA9 will be encoded
in two bytes. This does not appear to be a bug.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=110127&aid=624251&group_id=10127