[Python-Dev] Re: \ud800 crashes interpreter (PR#384)

M.-A. Lemburg mal@lemburg.com
Wed, 05 Jul 2000 11:27:21 +0200


Fredrik Lundh wrote:
> 
> mal wrote:
> > > Given the new 7-bit-ASCII-as-default-encoding-for-8-bit-strings
> > > convention, shouldn't just hashing the character values work
> > > fine?  That is, hash('abc') should == hash(u'abc'), no conversion
> > > required.
> >
> > Yes, and it does so already for pure ASCII values. The problem
> > comes from the fact that the default encoding can be changed to
> > a locale specific value (site.py does the lookup for you), e.g.
> > given you have defined LANG to be us_en, Python will default
> > to Latin-1 as default encoding.
> 
> footnote: in practice, this is a Unix-only feature.
> 
> I suggest adding code to the _locale module (or maybe sys is
> better?) which can be used to dig up a suitable encoding for
> non-Unix platforms.  On Windows, the code page should be
> "cp%d" % GetACP().
> 
> I'll look into this later today.

Could you add code to the _locale module which interfaces
to GetACP() on win32 ? 

locale.get_default could then make use of this API to figure
out the encoding.

Ideal would be another API for win32 which allows querying
the active language (it would have to return an RFC 1766
language code or we could add aliasis to the locale_alias
database).

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/