[Python-3000] [Python-Dev] Filename as byte string in python 2.6 or 3.0?

James Y Knight foom at fuhm.net
Mon Sep 29 18:16:32 CEST 2008


On Sep 29, 2008, at 3:32 AM, Adam Olsen wrote:
> On Sun, Sep 28, 2008 at 10:43 PM, James Y Knight <foom at fuhm.net>  
> wrote:
>> [1] UTF-8b has a similar property to 8859-1, in that all byte  
>> strings can be
>> successfully round-tripped. It's not currently implemented in  
>> python core,
>> but it's a pretty trivial encoding, and is available under the BSD  
>> license,
>> see below.
>
> UTF-8b doesn't work as intended.  It produces an invalid unicode
> object (garbage surrogates) that cannot be used with external APIs or
> libraries that require unicode.

I'd be interested to hear more detail on what you expect the practical  
ramifications of this to be. It doesn't sound likely to be a problem  
to me.

> If you don't need unicode then your
> code should state so explicitly, and 8859-1 is ideal there.

But, I *do* want unicode. ALL my filenames are encoded in utf8.  
Except...that one over there. That's the whole point of UTF-8b:  
correctly encoded names get decoded correctly and readably, and the  
other cases get decoded into something unique that cannot possibly  
conflict.

James


More information about the Python-3000 mailing list