[Expat-discuss] Expat treating ISO-8859-1 char strangely?

Stuart Powers stu728 at yahoo.com
Fri Jul 25 09:25:25 EDT 2003


Hi, we're new to this mailling list, and we were wondering if anyone here could help us with a problem we're having.
 
Our XML file (with encoding set to ISO-8859-1) contains the following string:
 
"Kickin’ it Dash style"
 
The apostrophe, we're pretty sure, is a character from the ISO-8859-1 character set. (We got this string for testing by copying and pasting from http://www.zeldman.com/daily/0703b.shtml#anil .)
 
We're using XML::DOM (which uses XML::DOM::Parser, which supposedly uses Expat) to parse this XML file, and when we send the parsed data to a browser (via HTTP), it comes out like this:
 
" KickinÂ’ it Dash style"
 
That is how Mozilla displays it when it is set to read character encoding ISO-8859-1. When set to read UTF-8, it simply displays "Kickin#146; it Dash style".
 
We would sort of understand it if Expat simply took our ISO-8859-1 character and copied it directly (byte by byte), or if it somehow converted it to UTF-8 and we got a UTF-8 character, but it appears that it's doing neither - it's sending us bytes which don't seem to be a valid character in either character set.
 
If anyone can shed some light on what's happening, give us some advice, or point us to a place with some more information that might help, that will be greatly appreciated. If you don't think that Expat is the source of the problem here, please let us know that also.
 
Thanks for any help you can give us. =)
 
 - John and Stu
 


---------------------------------
Do you Yahoo!?
Yahoo! SiteBuilder - Free, easy-to-use web site design software


More information about the Expat-discuss mailing list