[Expat-discuss] new to expat -- utf-16 encoding

Patrick McCormick patrick@meer.net
Tue Mar 26 15:30:06 2002


> when fed a string that contains...
>
> <?xml version="1.0" encoding="utf-16"?>
>
> I get an error like this...
>
> not well-formed (invalid token) at line 1
>
> If I change this string to contain <?xml version="1.0"
encoding="utf-8"?>,
> things work well.  However, the xml content is delivered over a socket
from
> a server which is not under my control.

If the string is exactly as you have it above in the file, that's not
utf-16.  A UTF-16 document starts with a BOM header (0xFE 0xFF for
big-endian) and each character is two bytes wide.

Also, make sure that if you specify utf-8 (either explicitly or through
omission of the XML header) that your generated documents are actually
output in utf-8, and not iso-8859-1, which is more common in the US.

My favorite Unicode encoding page is:

http://czyborra.com/utf/