How do I obtain the filenames' encoding after os.listdir?

Wed Jun 4 21:36:53 EDT 2003

Grzegorz Adam Hankiewicz wrote:

> Hi.
> 
> I'm using os.listdir() to read a directory. The function returns
> a list of string objects:
> 
> In [3]: l = os.listdir(".")
> In [7]: l[6]
> Out[7]: 'm\xfasica.mid'
> In [8]: l[6].decode("latin1")
> Out[8]: u'm\xfasica.mid'
> 
> I want to transform the strings to unicode objects so I can use them
> with pygtk, and the current process works ok for my machine. The
> question is how do I know the encoding of the filenames?  Currently
> I'm presuming latin1, but if somebody else uses a different encoding,
> how do I know which one?
> 
> The documentation says string.decode can use as parameter "ignore"
> or "replace", but using them only raises LookupErrors.

See Martin's post - but for Python 2.3, sys.getfilesystemencoding() is 
what you should be passing to decode.  On Windows, this will generally 
be "mbcs", and I believe that on some systems, it may be "utf8".  On 
platforms with no Unicode knowledge of the filesystem built into Python, 
it should return "ascii".

Again, for Python 2.3, os.listdir(u'.') on such platforms will return a 
Unicode object directly.

Depending on your platform, moving to 2.3 may be worthwhile if this is a 
common problem for you.

Mark.