[Python-3000] [Python-Dev] New proposition for Python3 bytes filename issue

Guido van Rossum guido at python.org
Tue Sep 30 01:41:36 CEST 2008


On Mon, Sep 29, 2008 at 4:29 PM, Victor Stinner
<victor.stinner at haypocalc.com> wrote:
> Le Monday 29 September 2008 19:06:01 Guido van Rossum, vous avez écrit :
>> >>  - listdir(unicode) -> unicode and raise an error on invalid filename
>>
>> I know I keep flipflopping on this one, but the more I think about it
>> the more I believe it is better to drop those names than to raise an
>> exception. Otherwise a "naive" program that happens to use
>> os.listdir() can be rendered completely useless by a single non-UTF-8
>> filename. Consider the use of os.listdir() by the glob module. If I am
>> globbing for *.py, why should the presence of a file named b'\xff'
>> cause it to fail?
>
> It would be hard for a newbie programmer to understand why he's unable to find
> his very important file ("important r?port.doc") using os.listdir().

*Every* failure in this scenario will be hard to understand for a
newbie programmer. We can just document the fact.

> And yes,
> if your file system is broken, glob(<unicode>) will fail.

Why should it?

> If we choose to support bytes on Linux, a robust and portable program have to
> use only bytes filenames on Linux to always be able to list and open files.

Right. But such robustness is only needed to support certain odd cases
and we cannot demand that most people bother to write robust code all
the time.

> A full example to list files and display filenames:
>
>  import os
>  import os.path
>  import sys
>  if os.path.supports_unicode_filenames:

This is backwards -- the Unicode API is always supported, the bytes
API only on Linux (and perhaps some other other Unixes).

>     cwd = getcwd()
>  else:
>     cwd = getcwdb()
>     encoding = sys.getfilesystemencoding()
>  for filename in os.listdir(cwd):
>     if os.path.supports_unicode_filenames:
>        text = str(filename, encoding, "replace)
>     else:
>        text = filename
>     print("=== File {0} ===".format(text))
>     for line in open(filename):
>        ...
>
> We need an "if" to choose the directory. The second "if" is only needed to
> display the filename. Using bytes, it would be possible to write better code
> detect the real charset (eg. ISO-8859-1 in a UTF-8 file system) and so
> display correctly the filename and/or propose to rename the file. Would it
> possible using UTF-8b / PUA hacks?

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)


More information about the Python-3000 mailing list