Encode exception for chinese text

"Martin v. Löwis" martin at v.loewis.de
Fri May 19 10:42:38 EDT 2006


John Machin wrote:
> 1. *By definition*, you can encode *any* Unicode string into utf-8.
> Proves nothing.
> 2. \u00a0 [no-break space] has no equivalent in gb2312, nor in the
> later gbk alias cp936. It does have an equivalent in the latest Chinese
> encoding, gb18030.

Also, *by definition*, though :-) For those that have not followed
encodings too closely: gb18030 is to gb2312 what UTF-8 is to ASCII.
Both encode the entire Unicode in an algorithmic way, and provide
byte-for-byte identical encodings for the for their respective
subset.

Regards,
Martin



More information about the Python-list mailing list