[Python-3000] [Python-Dev] Filename as byte string in python 2.6 or 3.0?
James Y Knight
foom at fuhm.net
Mon Sep 29 18:16:32 CEST 2008
On Sep 29, 2008, at 3:32 AM, Adam Olsen wrote:
> On Sun, Sep 28, 2008 at 10:43 PM, James Y Knight <foom at fuhm.net>
> wrote:
>> [1] UTF-8b has a similar property to 8859-1, in that all byte
>> strings can be
>> successfully round-tripped. It's not currently implemented in
>> python core,
>> but it's a pretty trivial encoding, and is available under the BSD
>> license,
>> see below.
>
> UTF-8b doesn't work as intended. It produces an invalid unicode
> object (garbage surrogates) that cannot be used with external APIs or
> libraries that require unicode.
I'd be interested to hear more detail on what you expect the practical
ramifications of this to be. It doesn't sound likely to be a problem
to me.
> If you don't need unicode then your
> code should state so explicitly, and 8859-1 is ideal there.
But, I *do* want unicode. ALL my filenames are encoded in utf8.
Except...that one over there. That's the whole point of UTF-8b:
correctly encoded names get decoded correctly and readably, and the
other cases get decoded into something unique that cannot possibly
conflict.
James
More information about the Python-3000
mailing list