[XML-SIG] Re: Issues with Unicode type

Lars Marius Garshol larsga@garshol.priv.no
25 Sep 2002 17:20:01 +0200


* Martin v. Loewis
| 
| 1. Ignore the problem. This is probably fine: nobody is using non-BMP
|    characters right now. Most systems have serious problem displaying
|    them, since font systems are restricted to 64k glyphs, and, in many
|    cases, to displaying characters in the BMP only.

Actually, Windows 2000 displays non-BMP characters just fine. MSIE can
be made to do it, Opera 6.0 does it just fine, Mozilla does not (I
think) do it.

Also, there are locales where non-BMP characters are essential.
Cantonese is probably the best example. You can't write the Cantonese
equivalent of the "-ing" ending in Cantonese with the BMP...

Getting this right is actually more than purely an exercise in
conformance, though as you say it is less important now than it will
be in 1-2 years.
 
| 2. Declare that this works correctly in UCS-4 builds of Python
|    only. People that need such characters will use an UCS-4 build of
|    Python, anyway; Guido expects Chinese users to be early adaptors
|    here. Notice that James has no such option: Java is inherently tied
|    to UTF-16.

Is the plan that Python will eventually be UCS-4 only?
 
| 3. Implement it properly. Please understand that you will be trading
|    efficiency for correctness.

:-)

-- 
Lars Marius Garshol, Ontopian         <URL: http://www.ontopia.net >
ISO SC34/WG3, OASIS GeoLang TC        <URL: http://www.garshol.priv.no >