LC_ALL and os.listdir()

Kenneth Pronovici pronovic at skyjammer.com
Wed Feb 23 02:03:56 EST 2005


I have some confusion regarding the relationship between locale,
os.listdir() and unicode pathnames.  I'm running Python 2.3.5 on a
Debian system.  If it matters, all of the files I'm dealing with are on
an ext3 filesystem.

The real code this problem comes from takes a configured set of
directories to deal with and walks through each of those directories
using os.listdir().

Today, I accidentally ran across a directory containing three "normal"
files (with ASCII filenames) and one file with a two-character unicode
filename.  My code, which was doing something like this:
   
   for entry in os.listdir(path):   # path is <type 'unicode'>
      entrypath = os.path.join(path, entry)

suddenly started blowing up with the dreaded unicode error:

   UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in 
   position 1: ordinal not in range(128)

To add insult to injury, it only happend for one of my test users, not
the others.

I ultimately traced the difference in behavior to the LC_ALL setting in
the environment.  One user had LC_ALL set to en_US, and the other didn't
have it set at all.

For the user with LC_ALL set, the os.listdir() call returned this, and
the os.path.join() call succeeded:

   [u'README.strange-name', u'\xe2\x99\xaa\xe2\x99\xac', 
    u'utflist.long.gz', u'utflist.cp437.gz', u'utflist.short.gz']

For the other user without LC_ALL set, the os.listdir() call returned
this, and the os.path.join() call failed with the UnicodeDecodeError
exception:

   [u'README.strange-name', '\xe2\x99\xaa\xe2\x99\xac', 
    u'utflist.long.gz', u'utflist.cp437.gz', u'utflist.short.gz']

Note that in this second result, element [1] is not a unicode string
while the other three elements are.

Can anyone explain:

   1) Why LC_ALL has any effect on the os.listdir() result? 
   2) Why only 3 of the 4 files come back as unicode strings?
   3) The proper "general" way to deal with this situation?

My goal is to build generalized code that consistently works with all
kinds of filenames.  Ultimately, all I'm trying to do is copy some files
around.  I'd really prefer to find a programmatic way to make this work
that was independent of the user's configured locale, if possible.

Thanks for the help,

KEN

--
Kenneth J. Pronovici <pronovic at ieee.org>



More information about the Python-list mailing list