[Python-Dev] PEP 277 (unicode filenames): please review

Martin v. Loewis martin@v.loewis.de
13 Aug 2002 16:45:36 +0200


Guido van Rossum <guido@python.org> writes:

> Aha!  So MBCS is not an encoding: it's an indirection for a variety of
> encodings.  (Is there a way to find out what the encoding is?)

Correct. In Python, locale.getdefaultlocale()[1] returns the encoding;
the underlying API function is GetACP, and Python uses it as

    PyOS_snprintf(encoding, sizeof(encoding), "cp%d", GetACP());

There is a second indirection, the "OEM code page", which they use:
- for on-disk FAT short file names,
- for the cmd.exe window

Python currently offers no access to GetOEMCP().

> Do you mean that the condition on
> 
> #if defined(HAVE_LANGINFO_H) && defined(CODESET)
> 
> is reliably false on Windows?  Otherwise _locale.setlocale() could set
> it.

Correct. nl_langinfo is a Sun invention (I believe) which made it into
Posix; Microsoft ignores it.

> So as long as they use 8-bit it's not our problem, right.  Another
> reason to avoid prodicing Unicode without a clue that the app expects
> Unicode (alas).  (BTW I find a Unicode argument to os.listdir() a
> sufficient clue.  IOW os.listdir(u".") should return Unicode.)

Indeed, that would be consistent. I deliberately want to leave this
out of PEP 277. On Unix, things are not that clear - as Jack points
out, readlink() and getcwd() also need consideration.

> > Ok, I'll update the PEP.
> 
> To what?  (It would be bad if I convinced you at the same time you
> convinced me of the opposite. :-)

I haven't changed anything yet, and I won't. 

In this terrain, Windows has the cleaner API (they consider file names
as character strings, not as byte strings), so doing the right thing
is easier.

Regards,
Martin