[Python-Dev] Re: \ud800 crashes interpreter (PR#384)
Fredrik Lundh
fredrik@pythonware.com
Wed, 5 Jul 2000 11:14:21 +0200
mal wrote:
> > Given the new 7-bit-ASCII-as-default-encoding-for-8-bit-strings
> > convention, shouldn't just hashing the character values work
> > fine? That is, hash('abc') should == hash(u'abc'), no conversion
> > required.
>
> Yes, and it does so already for pure ASCII values. The problem
> comes from the fact that the default encoding can be changed to
> a locale specific value (site.py does the lookup for you), e.g.
> given you have defined LANG to be us_en, Python will default
> to Latin-1 as default encoding.
footnote: in practice, this is a Unix-only feature.
I suggest adding code to the _locale module (or maybe sys is
better?) which can be used to dig up a suitable encoding for
non-Unix platforms. On Windows, the code page should be
"cp%d" % GetACP().
I'll look into this later today.
</F>