[Patches] Unicode changes.

M.-A. Lemburg mal@lemburg.com
Tue, 28 Mar 2000 14:48:49 +0200


Thomas Heller wrote:
> 
> > Thanks for the pointer. It seems as if MBCS is really just
> > another name for DBCS (the whole document speaks about DBCS
> > and only mentions MBCS as an alias).
> >
> > If this is true, then I don't know how the MBCS will actually
> > work, since DBCS is a class of encodings, not a single one.
> > Andy Robinson and friends on the i18n sig list are already
> > looking into writing codecs for the various encodings using
> > DBCS as basis.
> >
> > Looks like you just made Python CJK-aware on Windows by virtue
> > of letting the Win32 API decide which the current encoding is ;-).
> >
> 
> Is this the answer (from the article mentioned above)?
> 14. Tips
> 14.1. Development
> Don't mix and match Win32® APIs with CRT APIs
> The difference is that Win32 APIs rely on System information, whereas the CRT APIs rely on the user to initialize for the
> appropriate settings. CRT defaults to ANSI "C" locale.
> 
> For example: The following example will fail even if psz really points to a Japanese lead byte and the system is running in a
> different codepage other than 932.
> 
> // **** undefined behaviour ****
> setlocale(LC_ALL, "Japanese");  // set run-time to Japanese locale
> if (IsDBCSLeadByte(*psz))       // query system locale *** wrong ***
> ....
> // **** correct behaviour ****
> if (isleadbyte((_TXCHAR)*psz))  // correct locale is used. Also note that
>                                 // (_TXCHAR) casting was used to make sure
>                                 // integral conversion is correct

Hmm, Mark's mbcs codec uses the Win32 API, so if the user set
up the system to a CJK code page, then that code page will be
used... not really portable, but we have Unicode for that anyway :-)

On the positive side, the system will automagically use the right
encoding for the user's machine, on the negative side a Python
script cannot use mbcs codec to encode data using a predefined
code page: the data is probably only usable on his/her machine.

The Asian codec package will take care of the latter though,
so its not as bad as it may look.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/