[ expat-Bugs-491986 ] Charset decoding error

noreply@sourceforge.net noreply@sourceforge.net
Wed Dec 12 04:57:01 2001


Bugs item #491986, was opened at 2001-12-12 04:48
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=110127&aid=491986&group_id=10127

Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Bent Jensen (bentjensen)
Assigned to: Nobody/Anonymous (nobody)
Summary: Charset decoding error

Initial Comment:

When parsing xml with Danish letters (æøåÆØÅ) with 
eight bit set and declaring the encoding as <?xml 
version="1.0" encoding="iso-8859-1"?> (where the 
danish letters is placed as eight bit chars - the 
parser goes wrong.  If the input is:

  <person id="five.worker">
    <name><family>Worker</family> 
<given>Five</given></name>
    <email>J&oslash;rgen five@foo.com</email>
    <email>Jørgen five@foo.com</email>
    <link manager="Big.Boss"/>
  </person>


(Remark the danish letters in two forms)

The output is:

    START: email
CD: (null) - 'J' - 1
CD: (null) - 'rgen five@foo.com' - 17
END: email
CD: (null) - '
' - 1
CD: (null) - '    ' - 4
    START: email
CD: (null) - 'JÃ&#1567;rgen five@foo.com' - 20
END: email
CD: (null) - '
' - 1
CD: (null) - '    ' - 4

What am i doing wrong ?

If I embedd the string 'æøåÆØÅ' in the xml file - it 
goes all rigth ?!?!

I have modifyed the 'outline' example program for the 
above test.

Sincerly
Bent Jensen, Senior consultant.
bent@kiya.dk




----------------------------------------------------------------------

>Comment By: Bent Jensen (bentjensen)
Date: 2001-12-12 04:56

Message:
Logged In: YES 
user_id=392963

Info: The expat package (version 1.95.2) was build on 
alpha/axp OSF1 4.0D with gcc version 2.95.3. The test was 
run on the same machine.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=110127&aid=491986&group_id=10127