LC_ALL and os.listdir()

Wed Feb 23 13:08:49 EST 2005

On Wed, Feb 23, 2005 at 01:03:56AM -0600, Kenneth Pronovici wrote:
[snip]
> Today, I accidentally ran across a directory containing three "normal"
> files (with ASCII filenames) and one file with a two-character unicode
> filename.  My code, which was doing something like this:
>    
>    for entry in os.listdir(path):   # path is <type 'unicode'>
>       entrypath = os.path.join(path, entry)
> 
> suddenly started blowing up with the dreaded unicode error:
> 
>    UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in 
>    position 1: ordinal not in range(128)

Sorry to reply to my own note, but after sleeping on it, I think I've
come up with a reasonable solution.  Now that I've dug further and my
eyes are less bleery, everything seems to work as long as I only pass in
simple strings to the filesystem functions.  

I think that I can solve my problem by just converting any unicode
strings from configuration into utf-8 simple strings using encode().
Using this solution, all of my existing regression tests still pass, and
my code seems to make it past the unusual directory.

>    [u'README.strange-name', '\xe2\x99\xaa\xe2\x99\xac', 
>     u'utflist.long.gz', u'utflist.cp437.gz', u'utflist.short.gz']
> 
> Note that in this second result, element [1] is not a unicode string
> while the other three elements are.

I'm still confused as to why this happens, but since I work around it, I
guess I don't care so much.

Thanks,

KEN

-- 
Kenneth J. Pronovici <pronovic at ieee.org>