non standard path characters

"Martin v. Löwis" martin at v.loewis.de
Thu May 31 17:12:08 EDT 2007


> thanks for that. I guess the problem is that when a path is obtained
> from such an object the code that gets the path usually has no way of
> knowing what the intended use is. That makes storage as simple bytes
> hard. I guess the correct way is to always convert to a standard (say
> utf8) and then always know the required encoding when the thing is to be
> used.

Inside the program itself, the best things is to represent path names
as Unicode strings as early as possible; later, information about the
original encoding may be lost.

If you obtain path names from the os module, pass Unicode strings
to listdir in order to get back Unicode strings. If they come from
environment variables or command line arguments, use
locale.getpreferredencoding() to find out what the encoding should
be.

If they come from a zip file, Tijs already explained what the encoding
is.

Always expect encoding errors; if they occur, chose to either skip
the file name, or report an error to the user. Notice that listdir
may return a byte string if decoding fails (this may only happen
on Unix).

Regards,
Martin



More information about the Python-list mailing list