[Expat-discuss] Parsing copyright symbol
Warren Young
warren at etr-usa.com
Tue Jul 8 01:56:17 CEST 2008
М.В. wrote:
>
> ...copyright symbol (code 0xae)...utf8?
You are confused on a number of fronts:
First, 0xAE is not a valid UTF-8 code, by itself. Read this on how
UTF-8 encodes multi-byte characters over 0x007F down to multiple bytes
over 0x80 in value:
http://en.wikipedia.org/wiki/UTF-8
Second, 0xAE is the registered trademark symbol in ISO 8859-1 (Latin-1),
not a copyright symbol. The copyright symbol is 0xA9 in Latin-1.
Third, XML defaults to UTF-8, so unless you declare the document's
character set differently in the <?xml> tag, that's what expat will use.
Either convert your data into UTF-8 format, or tell Expat the truth
about your document's content:
<?xml version="1.0" encoding="iso-8859-1"?>
I'm just guessing about it being 8859-1. It could be 8859-15, or
probably several other encodings.
More information about the Expat-discuss
mailing list