os.lisdir, gets unicode, returns unicode... USUALLY?!?!?
Marc 'BlackJack' Rintsch
bj_666 at gmx.net
Fri Nov 17 07:11:58 EST 2006
In <4cefe$455d8f47$59ad1aca$3993 at news.flashnewsgroups.com>, gabor wrote:
> Marc 'BlackJack' Rintsch wrote:
>> In <mailman.294.1163721712.32031.python-list at python.org>, Jean-Paul
>> Calderone wrote:
>>
>>>> How would you propose listdir should behave?
>>> Umm, just a wild guess, but how about raising an exception which includes
>>> the name of the file which could not be decoded?
>>
>> Suppose you have a directory with just some files having a name that can't
>> be decoded with the file system encoding. So `listdir()` fails at this
>> point and raises an exception. How would you get the names then? Even the
>> ones that *can* be decoded? This doesn't look very nice:
>>
>> path = u'some path'
>> try:
>> files = os.listdir(path)
>> except UnicodeError, e:
>> files = os.listdir(path.encode(sys.getfilesystemencoding()))
>> # Decode and filter the list "manually" here.
>
> i agree that it does not look very nice.
>
> but does this look nicer? :)
>
> path = u'some path'
> files = os.listdir(path)
>
> def check_and_fix_wrong_filename(file):
> if isinstance(file,unicode):
> return file
> else:
> #somehow convert it to unicode, and return it
>
> files = [check_and_fix_wrong_filename(f) for f in files]
I think this is very "special" code as you can't use the fixed names to
open the files anymore unless you guess the encoding correctly. I think
it's a bit fragile. Wouldn't it be a better solution to convert the
`path` to the file system encoding for getting the file names. This way
you can use all the names to process the files.
> in other words, your opinion is that the proposed solution is not
> optimal, or that the current behavior is fine?
I think the current behavior is okay but should be documented.
Maybe I just didn't had enough use cases yet that needed the names as
unicode objects and from my linux file systems experience file names are
just byte strings with two limitations: no slashes and no zero bytes. :-)
Ciao,
Marc 'BlackJack' Rintsch
More information about the Python-list
mailing list