LC_ALL and os.listdir()
Kenneth Pronovici
pronovic at skyjammer.com
Wed Feb 23 13:08:49 EST 2005
On Wed, Feb 23, 2005 at 01:03:56AM -0600, Kenneth Pronovici wrote:
[snip]
> Today, I accidentally ran across a directory containing three "normal"
> files (with ASCII filenames) and one file with a two-character unicode
> filename. My code, which was doing something like this:
>
> for entry in os.listdir(path): # path is <type 'unicode'>
> entrypath = os.path.join(path, entry)
>
> suddenly started blowing up with the dreaded unicode error:
>
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in
> position 1: ordinal not in range(128)
Sorry to reply to my own note, but after sleeping on it, I think I've
come up with a reasonable solution. Now that I've dug further and my
eyes are less bleery, everything seems to work as long as I only pass in
simple strings to the filesystem functions.
I think that I can solve my problem by just converting any unicode
strings from configuration into utf-8 simple strings using encode().
Using this solution, all of my existing regression tests still pass, and
my code seems to make it past the unusual directory.
> [u'README.strange-name', '\xe2\x99\xaa\xe2\x99\xac',
> u'utflist.long.gz', u'utflist.cp437.gz', u'utflist.short.gz']
>
> Note that in this second result, element [1] is not a unicode string
> while the other three elements are.
I'm still confused as to why this happens, but since I work around it, I
guess I don't care so much.
Thanks,
KEN
--
Kenneth J. Pronovici <pronovic at ieee.org>
More information about the Python-list
mailing list