os.lisdir, gets unicode, returns unicode... USUALLY?!?!?
gabor
gabor at nekomancer.net
Thu Nov 16 18:09:56 EST 2006
Martin v. Löwis wrote:
> gabor schrieb:
>
>> or am i using os.listdir the "wrong way"? how do other people deal with
>> this?
>
> You didn't say why the behavior causes a problem for you - you only
> explained what the behavior is.
>
> Most people use os.listdir in a way like this:
>
> for name in os.listdir(path):
> full = os.path.join(path, name)
> attrib = os.stat(full)
> if some-condition:
> f = open(full)
> ...
>
> All this code will typically work just fine with the current behavior,
> so people typically don't see any problem.
>
i am sorry, but it will not work. actually this is exactly what i did,
and it did not work. it dies in the os.path.join call, where file_name
is converted into unicode. and python uses 'ascii' as the charset in
such cases. but, because listdir already failed to decode the file_name
with the filesystem-encoding, it usually also fails when tried with 'ascii'.
example:
>>> dir_name = u'something'
>>> unicode_file_name = u'\u732b.txt' # the japanese cat-symbol
>>> bytestring_file_name = unicode_file_name.encode('utf-8')
>>>
>>>
>>> import os.path
>>>
>>> os.path.join(dir_name,unicode_file_name)
u'something/\u732b.txt'
>>>
>>>
>>> os.path.join(dir_name,bytestring_file_name)
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "/usr/lib/python2.4/posixpath.py", line 65, in join
path += '/' + b
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe7 in position 1:
ordinal not in range(128)
>>>
gabor
More information about the Python-list
mailing list