LC_ALL and os.listdir()

Kenneth Pronovici pronovic at skyjammer.com
Wed Feb 23 18:37:16 EST 2005


On Wed, Feb 23, 2005 at 10:07:19PM +0100, "Martin v. Löwis" wrote:
> So we have three options:
> 1. skip this string, only return the ones that can be
>    converted to Unicode. Give the user the impression
>    the file does not exist.
> 2. return the string as a byte string
> 3. refuse to listdir altogether, raising an exception
>    (i.e. return nothing)
> 
> Python has chosen alternative 2, allowing the application
> to implement 1 or 3 on top of that if it wants to (or
> come up with other strategies, such as user feedback).

Understood.  This appears to be the most flexible solution among the
three.

> >3) The proper "general" way to deal with this situation?
> 
> You can chose option 1 or 3; you could tell the user
> about it, and then ignore the file, you could try to
> guess the encoding (UTF-8 would be a reasonable guess).

Ok.

> >My goal is to build generalized code that consistently works with all
> >kinds of filenames.
> 
> Then it is best to drop the notion that file names are
> character strings (because some file names aren't). You
> do so by converting your path variable into a byte
> string. To do that, you could try
[snip]
> So your code would read
> 
> try:
>   path = path.encode(sys.getfilesystemencoding() or
>                      sys.getdefaultencoding())
> except UnicodeError:
>   print >>sys.stderr, "Invalid path name", repr(path)
>   sys.exit(1)

This makes sense to me.  I'll work on implementing it that way.

Thanks for the in-depth explanation!

KEN

-- 
Kenneth J. Pronovici <pronovic at ieee.org>
Personal Homepage: http://www.skyjammer.com/~pronovic/
"They that can give up essential liberty to obtain a little 
 temporary safety deserve neither liberty nor safety." 
      - Benjamin Franklin, Historical Review of Pennsylvania, 1759 



More information about the Python-list mailing list