[Python-Dev] Adding the 'path' module (was Re: Some RFE for review)

Thomas Heller theller at python.net
Fri Jul 8 16:48:04 CEST 2005


Neil Hodgson <nyamatongwe at gmail.com> writes:

> Thomas Heller:
>
>> OTOH, I once had a bug report from a py2exe user who complained that the
>> program didn't start when installed in a path with japanese characters
>> on it.  I tried this out, the bug existed (and still exists), but I was
>> astonished how many programs behaved the same: On a PC with english
>> language settings, you cannot start WinZip or Acrobat Reader (to give
>> just some examples) on a .zip or .pdf file contained in such a
>> directory.
>
>    Much of the time these sorts of bugs don't make themselves too hard
> to live with because  most non-ASCII names that any user encounters
> are still in the user's locale and so get mapped by Windows.

> It can be a lot of work supporting wide file names. I have just added
> wide file name support to my editor, SciTE, for the second time and am
> about to rip it out again as it complicates too much code for too few
> beneficiaries. (I want one executable for both Windows NT+ and 9x, so
> wide file names has to be a runtime choice leading to maybe 50 new
> branches in the code).

In python, the basic support for unicode file and pathnames is already
there.  No problem to open a file named
u'\u5b66\u6821\u30c7\u30fc\\blah.py on WinXP with german locale.

But adding u'\u5b66\u6821\u30c7\u30fc' to sys.path won't allow to import
this file as module.  Internally Python\import.c converts everything to
strings.  I started to refactor import.c to work with PyStringObjects
instead of char buffers as a first step - PyUnicodeObjects could have
been added later, but I gave up because there seems absolute zero
interest in it.

Ok - it makes no sense to have Python modules in directories with these
filenames, but Python (especially when frozen or py2exe'd) itself could
easily live itself in such a directory.

>    If returning a mixture of unicode and narrow strings from
> os.listdir is the right thing to do then maybe it better for sys.argv
> and os.environ to also be mixtures. In patch #1231336 I added parallel
> attributes, sys.argvu and os.environu to hold unicode versions of this
> information. The alternative, placing unicode items in the existing
> attributes minimises API size.
>
>    One question here is whether unicode items should be added only
> when the element is outside the user's locale (the CP_ACP code page)
> or whenever the item is outside ASCII. The former is more similar to
> existing behaviour but the latter is safer as it makes it harder to
> implicitly treat the data as being in an incorrect encoding.

I can't judge on this - but it's easy to experiment with it, even in
current Python releases since sys.argvu, os.environu can also be
provided by extension modules.

But thanks that you care about this stuff - I'm a little bit worried
because all the other folks seem to think everything's ok (?).

Thomas



More information about the Python-Dev mailing list