os.lisdir, gets unicode, returns unicode... USUALLY?!?!?

"Martin v. Löwis" martin at v.loewis.de
Mon Nov 20 01:55:23 EST 2006


Ross Ridge schrieb:
> Martin v. Löwis wrote:
>> One approach I had been considering is to always make the decoding
>> succeed, by using the private-use-area of Unicode to represent bytes
>> that don't decode correctly.
> 
> That would conflict with private use characters appearing in file
> names.

Not necessarily: they could get escaped.

AFAICT, you can have that conflict only if the file system encoding
is UTF-8: otherwise, there is no way to represent them.

> Personally, I think os.listdir() should return the file names only in
> Unicode if they're actually stored that way in the underlying file
> system (eg. NTFS), otherwise return them as byte strings.  I doubt
> anyone in this thread would like that, though.

So I assume you would not want to allow to pass Unicode strings
to open(), stat() etc. either, as the _real_ file system API requires
byte strings there, as well?

People would indeed see that as a step backwards. If you don't want
Unicode strings returned from listdir, don't pass Unicode string
as the directory name.

Technically, how do you determine whether the underlying file
system stores file names "in Unicode"? Does OSX use Unicode
(it requires path names to be UTF-8)? After all, each and
every encoding is a Unicode encoding - that was a design
goal of Unicode.

Regards,
Martin



More information about the Python-list mailing list