[Expat-discuss] Windows-1252 and Latin-1
Baldur van Lew
blew at medis.nl
Tue Jan 14 16:02:31 EST 2003
Unfortunately 1252 isn't the same as Latin-1 (ISO-8859-1) - this is a
constant source of confusion.
Specifically in the range 80-9F Windows 1252 has a number of characters
defined which do not appear in 8859-1
"EUR'f"...??^?S<OEZ''""*--~(tm)s>oezY" - if you're running windows check
the Character Map application with subset Windows Characters.
I assume you have to define your own encoding table to handle this (extended
latin1 table?).
Baldur van Lew
-----Original Message-----
From: Fred L. Drake, Jr. [mailto:fdrake at acm.org]
Sent: Tuesday, January 14, 2003 3:48 PM
To: swapnel.shrivastava at mentorix.com
Cc: expat-discuss at libexpat.org
Subject: Re: [Expat-discuss] Windows-1252 and Latin-1
Swapnel writes:
> Dear All,
> The problem is : If the Encoding scheme specified in XML document is
> "windows=1252" . and the file is saved using Notepad of win-2000 stating
> encoding scheme as Unicode then Expat doesnot parse the document. But the
> same file if saved by notepad setting encoding "ANSI" or "UTF-8" then
> process is carried out smoothly.
>
> Eg:
> XML document : <?xml version="1.0"? encoding= "windows-1252">.
> The editor used to saved this document is win-2000 notepad with encoding
> option as unicode. Expat is not able to parse this document.
Hmm. According to this page:
http://www.microsoft.com/globaldev/reference/sbcs/1252.htm
"Windows-1252" is a synonym for Latin-1, or ISO-8859-1 (I didn't
compare the table codepoint-by-codepoint, just trusting the text of
the page). If you use ISO-8859-1 in the XML declaration, Expat should
be perfectly happy. The issue is that Windows-1252 is a non-standard
name for the encoding.
We should consider adding "windows-1252" to the list of supported
encodings for Expat since it is (supposedly) identical with Latin-1.
> Is there any way by which expat can cnvert the XML document saved with
> Encoding scheme UNICODE to ANSI or UTF-8 and then it takes for parsing?
In general, you can tell Expat to assume the data is in a particular
encoding by specifying the encoding in the call to XML_ParserCreate(),
and then re-encode the input yourself. Another option is to use the
facilities Expat provides to hook in additional decoders; see the
reference.html file that comes with Expat for API information.
-Fred
--
Fred L. Drake, Jr. <fdrake at acm.org>
PythonLabs at Zope Corporation
_______________________________________________
Expat-discuss mailing list
Expat-discuss at libexpat.org
http://mail.libexpat.org/mailman/listinfo/expat-discuss
More information about the Expat-discuss
mailing list