[Python-Dev] Windows: Remove support of bytes filenames in the os module?

Andrew Barnert abarnert at yahoo.com
Mon Feb 8 13:16:10 EST 2016


On Monday, February 8, 2016 9:11 AM, Alexander Walters <tritium-list at sdamon.com> wrote:


> 
> On 2/8/2016 12:02, Brett Cannon wrote:
>> 
>> 
>>  If Unicode string don't work in Python 2 then what is Python 2/3 to do 
>>  as a cross-platform solution if we completely remove bytes support in 
>>  Python 3? Wouldn't that mean there is no common type between Python 2 
>>  & 3 that one can use which will work with the os module except native 
>>  strings (which are difficult to get right)?
> 
> The only solution then would be to do `if not PY3: arg = 
> arg.encode(...);; os.SOMEFUNC(arg)`, pardon my psudocode.  
That's exactly what you _don't_ want to do.

More generally, the assumption here is wrong. 

It's not true that you can't use Unicode for Window filenames on Python 2. What is true is that you have to be a lot more careful about using Unicode _consistently_. And that Python 2 gives you very little help in doing so. And some third-party modules may make it harder on you. But if you always use unicode, `os.listdir(u'.')` calls FindFirstFileW instead of FindFirstFileA and gives you back unicode filenames, os.stat or open call _wstat or _wopen with those unicode filenames, etc.

The problem is that on POSIX, you're often better off using str everywhere, because Python 2.7 doesn't do surrogate escape. And once you're using str on one platform/unicode on the other for filenames, it gets very easy to mix str and unicode in other places (like strings you want to print out for the user or store in a database), and then you're in mojibake hell.

The io module, the pathlib backport, and six can help a bit (at the cost of performance and/or simplicity), but there's no easy answer--if there _were_ an easy answer, we wouldn't have Python 3 in the first place, right?


More information about the Python-Dev mailing list