[Python-Dev] (Not) delaying the 3.2 release

Guido van Rossum guido at python.org
Thu Sep 16 19:56:56 CEST 2010


On Thu, Sep 16, 2010 at 10:46 AM, Martin (gzlist) <gzlist at googlemail.com> wrote:
> On 16/09/2010, Guido van Rossum <guido at python.org> wrote:
>>
>> In all cases I can imagine where such polymorphic functions make
>> sense, the necessary and sufficient assumption should be that the
>> encoding is a superset of 7-bit(*) ASCII. This includes UTF-8, all
>> Latin-N variant, and AFAIK also the popular CJK encodings other than
>> UTF-16. This is the same assumption made by Python's byte type when
>> you use "character-based" methods like lower().
>
> Well, depends on what exactly you're doing, it's pretty easy to go wrong:
>
> Python 3.2a2+ (py3k, Sep 16 2010, 18:43:45) [MSC v.1500 32 bit (Intel)] on win32
> Type "help", "copyright", "credits" or "license" for more information.
>>>> import os, sys
>>>> os.path.split("C:\\十")
> ('C:\\', '十')
>>>> os.path.split("C:\\十".encode(sys.getfilesystemencoding()))
> (b'C:\\\x8f', b'')
>
> Similar things can catch out web developers once they step outside the
> percent encoding.

Well, that character is not 7-bit ASCII. Of course things will go
wrong there. That's the whole point of what I said, isn't it?

-- 
--Guido van Rossum (python.org/~guido)


More information about the Python-Dev mailing list