[Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

"Martin v. Löwis" martin at v.loewis.de
Sat Apr 25 14:12:25 CEST 2009


> | 2. Even if they were taken away (which the PEP does not propose to do),
> |    it would be easy to emulate them for applications that want them.
> |    For example, listdir could be wrapped as
> | 
> |    def listdir_b(bytestring):
> |        fse = sys.getfilesystemencoding()
> 
> Alas, no

No, what? No, that algorithm would be incorrect?

> because there is no sys.getfilesystemencoding() at the POSIX
> level. It's only the user's current locale stuff on a UNIX system, and
> has _nothing_ to do with the filesystem because UNIX filesystems don't
> have encodings.

So can you produce a specific example where my proposed listdir_b
function would fail to work correctly?

For it to work, it is not necessary that POSIX has no notion of
character sets on the file system level (which is actually not true -
POSIX very well recognizes the notion of character sets for file
names, and recommends that you restrict yourself to the portable
character set).

> In particular, because the "best" (or to my mind "misleading") you
> can do for this is report what the current user thinks:
>   http://docs.python.org/library/sys.html#sys.getfilesystemencoding
> then there's no guarrentee that what is chosen has any releationship to
> what was in use when the files being consulted were made.

For this PEP, it's irrelevant. It will work even if the chosen encoding
is a bad choice.

> Now, if I were writing listdir_b() I'd want to be able to do something
> along these lines:
>   - set LC_ALL=C (or some equivalent mechanism)
>   - have os.listdir() read bytes as numeric values and transcode their values
>     _directly_ into the corresponding Unicode code points.
>   - yield bytes( ord(c) for c in os_listdir_string )
>   - have os.open() et al transcode unicode code points back into bytes.
> i.e. a straight one-to-one mapping, using only codepoints in the range
> 1..255.

That would be an alternative approach to the same problem (and one that
I think will fail more badly than the one I'm proposing).

Regards,
Martin


More information about the Python-Dev mailing list