pep 277, Unicode filenames & mbcs encoding &c.

Martin v. Löwis martin at v.loewis.de
Tue Oct 21 17:59:33 EDT 2003


"Edward K. Ream" <edreamleo at charter.net> writes:

> Am I reading pep 277 correctly?  On Windows NT/XP, should filenames always
> be converted to Unicode using the mbcs encoding?

What do you mean with "should"? "Should Python always..." or "Should
the application always"?

PEP 277 actually answers neither question. As Vincent explains,
nothing changes with respect to using byte strings on the API. The
changes only affect Unicode strings passed to functions expecting file names.

> For example,
> 
> myFile = unicode(__file__, "mbcs", "strict")
> 
> This seems to work

And it has nothing to do with PEP 277: You are not passing myFile to
any API function.

If you mean to use myFile as a file name, then yes: this is intended
to work. However, using plain __file__ directly should also work.

> Am I correct that conversions to Unicode (using "mbcs" on Windows) should be
> done before passing arguments to os.path.join, os.path.split,
> os.path.normpath, etc. ?  

You should either use only Unicode strings, or only byte strings. The
functions of os.path are not all affected by the PEP 277
implementation (although they probably should).

> Presumably os.path functions use the default
> system encoding to convert strings to Unicode, which isn't likely to be
> "mbcs" or anything else useful :-)

Right. This is actually unfortunate.

> Are there any situations where some other encoding should be used instead on
> Windows?

If you get data from a cmd.exe Window.

> What about other platforms? For instance, does Linux allow non-ascii
> file names?

Yes, it does.

> If so, what encoding should be specified when converting to Unicode?

Nobody knows, but the convention is to use the locale's encoding, as
returned by locale.getpreferredencoding().

Regards,
Martin




More information about the Python-list mailing list