[Python-Dev] Adding the 'path' module (was Re: Some RFE for review)

Neil Hodgson nyamatongwe at gmail.com
Sat Jul 16 09:30:07 CEST 2005


Martin v. Löwis:

> - But then, the wide API gives all results as Unicode. If you want to
>   promote only those entries that need it, it really means that you
>   only want to "demote" those that don't need it. But how can you tell
>   whether an entry needs it? There is no API to find out.

   I wrote a patch for os.listdir at
http://www.scintilla.org/difft.txt that uses WideCharToMultiByte to
check if a wide name can be represented in a particular code page and
only uses that representation if it fits. This is good for Windows
code pages including ASCII and "mbcs" but since Python's
sys.getdefaultencoding() can be something that has no code page
equivalent, it would have to try converting using strict mode and
interpret failure as leaving the name as unicode.

>   You could declare that anything with characters >128 needs it,
>   but that would be an incompatible change: If a character >128 in
>   the system code page is in a file name, listdir currently returns
>   it in the system code page. It then would return a Unicode string.

   I now quite like returning unicode for anything non-ASCII on
Windows as there is no ambiguity in what the result means and there
will be no need to change all the system calls to translate from the
default encoding. It is a change to the API which can lead to code
breaking but it should break with an exception. Assuming that byte
string arguments are using Python's default encoding looks more
dangerous with a behavioural change but no notification.

   Neil


More information about the Python-Dev mailing list