[Expat-discuss] Character Encoding 4 bytes Limitation
Karl Waclawek
karl at waclawek.net
Mon Aug 7 17:10:30 CEST 2006
chandan kumar wrote:
> Hi All,
>
>
> The expat doc/reference.html mentions these limitation for character encoding.
> -----
> Expat places restrictions on character encodings that it can support by filling in the XML_Encoding structure. include file:
>
> 2. Characters must be encoded in 4 bytes or less.
> 3. All characters encoded must have Unicode scalar values less than or equal to 65535 (0xFFFF)This does not apply to the built-in support for UTF-16 and UTF-8
> ------
>
> Some of the chinese characters fall beyond this range. Does this mean that expat cannot parse all the chinese characters?
>
Expat can parse all Chinese characters as long as they are encoded in
UTF-16 or UTF-8.
These limitations only apply to non-Unicode encodings.
Someone has supplied an Expat patch to support the GB2312 encoding. See
patch # 888879.
>
> Is there any expat document providing the list of characters supported?
>
There are source code comments in expat.h for the XML_Encoding
structure, but not a list.
Karl
More information about the Expat-discuss
mailing list