[Python-Dev] Adding the 'path' module (was Re: Some RFE for review)

M.-A. Lemburg mal at egenix.com
Sat Jul 9 13:06:37 CEST 2005


Neil Hodgson wrote:
> Thomas Heller:
> 
> 
>>But adding u'\u5b66\u6821\u30c7\u30fc' to sys.path won't allow to import
>>this file as module.  Internally Python\import.c converts everything to
>>strings.  I started to refactor import.c to work with PyStringObjects
>>instead of char buffers as a first step - PyUnicodeObjects could have
>>been added later, but I gave up because there seems absolute zero
>>interest in it.
>
>    Well, most people when confronted with this will rename the
> directory to something simple like "ulib" and continue.

I don't really buy this "trick": what if you happen to have
a home directory with Unicode characters in it ?

>>I can't judge on this - but it's easy to experiment with it, even in
>>current Python releases since sys.argvu, os.environu can also be
>>provided by extension modules.
> 
> 
>    It is the effect of this on the non-unicode-savvy that is
> important: if os.environu goes into prereleases of 2.5 then the only
> people that will use it are likely to be those who already try to keep
> their code unicode compliant. There is only likely to be (negative)
> feedback if existing features are made unicode-only or use unicode for
> non-ASCII.

I don't like the idea of creating a parallel universe for
Unicode - OSes are starting to integrate Unicode filenames
rather quickly (UTF-8 on Unix, UTF-16-LE on Windows), so
it's much better to follow them and start accepting Unicode in
sys.path.

Wouldn't it be easy to have the import logic convert Unicode
entries in sys.path to whatever the OS uses internally (UTF-8
or UTF-16-LE) and then keep the char buffers in place ?

>>But thanks that you care about this stuff - I'm a little bit worried
>>because all the other folks seem to think everything's ok (?).
> 
>    Unicode is becoming more of an issue: many Linux distributions now
> install by default with a UTF8 locale and other tools are starting to
> use this: GCC 4 now delivers error messages using Unicode quote
> characters like 'these' rather than `these'. There are 131 threads
> found by Google Groups for (UnicodeEncodeError OR UnicodeDecodeError)
> and 21 of these were in this June. A large proportion of the threads
> are in language-specific groups so are not as visible to core
> developers.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Jul 09 2005)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::


More information about the Python-Dev mailing list