[Python-3000] [Python-Dev] Filename as byte string in python 2.6 or 3.0?

James Y Knight foom at fuhm.net
Wed Oct 1 20:30:29 CEST 2008


BTW, Windows will cheerfully let you create and access files with  
"garbage surrogates" in it.
Try it yourself:

open(u"\ud8fd", 'w').close()
os.listdir(u'.')

IMO that pretty much blows out of the water any suggestion encoding  
invalid UTF-8 sequences into lone surrogates is an evil and broken  
thing to do.

So, I'm back to favoring the lone surrogate plan over the U+0000 plan.  
But either one seems better than the alternatives.

James

On Sep 29, 2008, at 11:11 PM, Stephen J. Turnbull wrote:

> James Y Knight writes:
>> On Sep 29, 2008, at 3:32 AM, Adam Olsen wrote:
>
>>> UTF-8b doesn't work as intended.  It produces an invalid unicode
>>> object (garbage surrogates) that cannot be used with external APIs  
>>> or
>>> libraries that require unicode.
>>
>> I'd be interested to hear more detail on what you expect the  
>> practical
>> ramifications of this to be. It doesn't sound likely to be a problem
>> to me.
>
> That's because you have a specific use case in mind.  Adam clearly has
> in mind passing the filename on to a library which might proceed to
> signal an error (to him, unexpected) on garbage surrogates.  He
> doesn't want to be surprised by that.



More information about the Python-3000 mailing list