os.lisdir, gets unicode, returns unicode... USUALLY?!?!?

gabor gabor at nekomancer.net
Thu Nov 16 18:09:56 EST 2006


Martin v. Löwis wrote:
> gabor schrieb:
> 
>> or am i using os.listdir the "wrong way"? how do other people deal with
>> this?
> 
> You didn't say why the behavior causes a problem for you -  you only
> explained what the behavior is.
> 
> Most people use os.listdir in a way like this:
> 
> for name in os.listdir(path):
>   full = os.path.join(path, name)
>   attrib = os.stat(full)
>   if some-condition:
>     f = open(full)
>   ...
> 
> All this code will typically work just fine with the current behavior,
> so people typically don't see any problem.
> 

i am sorry, but it will not work. actually this is exactly what i did,
and it did not work. it dies in the os.path.join call, where file_name 
is converted into unicode. and python uses 'ascii' as the charset in 
such cases. but, because listdir already failed to decode the file_name 
with the filesystem-encoding, it usually also fails when tried with 'ascii'.

example:

 >>> dir_name = u'something'
 >>> unicode_file_name = u'\u732b.txt' # the japanese cat-symbol
 >>> bytestring_file_name = unicode_file_name.encode('utf-8')
 >>>
 >>>
 >>> import os.path
 >>>
 >>> os.path.join(dir_name,unicode_file_name)
u'something/\u732b.txt'
 >>>
 >>>
 >>> os.path.join(dir_name,bytestring_file_name)
Traceback (most recent call last):
   File "<stdin>", line 1, in ?
   File "/usr/lib/python2.4/posixpath.py", line 65, in join
     path += '/' + b
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe7 in position 1: 
ordinal not in range(128)
 >>>


gabor



More information about the Python-list mailing list