[I18n-sig] Re: All this Unicode discussion

Frank Chen frank63@ms5.hinet.net
Sun, 11 Feb 2001 14:02:45 -0000


> Brian and I are worried about all these proposals flying around.
> Americans seem to feel that having Unicode everywhere is
> 'the right thing'. But we have not heard from enough people
> in Japan or in Chinese-speaking countries, and the list has
> NEVER had input from  e.g. Arabic speakers or Eastern Europe.
> 

In fact, some people in mainland China look like to arguly object Unicode 
in Chinese softwares. The Han Unification for CJK reveals their unknowns
about CJK ideography. If in the future, the UCS4 can deploy a complete
allocation area for each written language, especially for CJK, I think it
is fine
to use Unicode as the internal data type. I am even thinking is there a 
chance to embrace ancient Egyptian hieroglyphics into Unicode, but it was
a dead script though.

> Is it really desirable, long term, to have Unicode strings as the
> default
> type in Python?  Do we need separate Unicode file and Binary
> file annd socket types? Or are we better with what we have now -
> no fundamental changes, but with codecs and Unicode strings
> when you want them?

I see the proposal, it seems not to treat Unicode as pivot internally, but
an add-on when an encoding declaration is set. If there is no encoding
declaration setting, it should function like before, right? Or if it is set
to Latin-1, it should work like current Python, right? For now, I can
put Big5 characters in Python strings, and the Windows or Chinese emulator
can interpret Big5 strings correctly when Python displays them on the 
screen. I think the future version should keep this alive.

But I am worries about the conversion time when mapping to Unicode.
The Python start-up time for initialization may take too long.

> 
> In addition, are there any benefits or problems when you
> deal with double-byte data in Java, VB, or any other languages
> you are familiar with?
> 

I think the reason that Java or Windows use Unicode in internal processing
is mainly for quick universal delivering. And the reason why Unicode raises
is the same, for many local encodings slow down the productivity when the
product is world-widely spreaded. So, if Python wants to ship with i18n &
10n
(then it can display local encoding message with its environment in
different areas and the like),
it surely can use Unicode for delivering efficiency.


Frank Chen