[Expat-bugs] [ expat-Bugs-600479 ] error decoding UTF-8 triplet

noreply@sourceforge.net noreply@sourceforge.net
Mon, 26 Aug 2002 17:30:24 -0700


Bugs item #600479, was opened at 2002-08-26 17:34
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=110127&aid=600479&group_id=10127

Category: www.libexpat.org
Group: None
Status: Open
>Resolution: Fixed
Priority: 5
Submitted By: Nobody/Anonymous (nobody)
Assigned to: Fred L. Drake, Jr. (fdrake)
Summary: error decoding UTF-8 triplet

Initial Comment:
On Windows, when reading the UTF-8 sequence "EF 
BA BF", utf8_isInvalid3 returns TRUE, when it should 
return FALSE. This UTF-8 sequence encodes to "FEBF" 
as UCS-2 (Unicode), but as a result of utf8_isInvalid3 
returning TRUE, an error results and the character isn't 
decoded properly.

This is using expat 1.95.4.

Attached is a simple XML file which illustrates the 
problem.

----------------------------------------------------------------------

>Comment By: Karl Waclawek (kwaclaw)
Date: 2002-08-26 20:30

Message:
Logged In: YES 
user_id=290026

Yes, this is a bug.
utf8_isInvalid3 tries to detect the invalid XML sequences
(*not* invalid unicode) EF BF BE and EF BF BF, but
only checks the first and third byte, not the second one.

Fix alread checked into CVS (xmltok.c 1.23).
Please check out and test.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=110127&aid=600479&group_id=10127