[Python-Dev] unicode imports

Mon Jun 19 22:41:27 CEST 2006

Kristján V. Jónsson wrote:
> Wouldn´t it be possible then to emulate the unix way?  Simply encode
> any unicode paths to utf-8, process them as normal, and then decode
> them just prior to the actual windows io call?

That won't work. People also put path names from the ANSI code page
onto sys.path and expect that to work - it always worked, and is
a nearly-complete work-around to put directories with funny characters
onto sys.path. sys.path is a list, so we have little control over
what gets put onto it.

> Of course, once there, why not do it unicode all the way up to that
> last point?  Unless there are platforms without wchar_t that would
> make sense.

Again, we can't really control that. Also, most platforms have no
wchar_t API for file IO. We would have to encode each sys.path
element for each stat() call, which would be quite expensive

> At any rate, I am trying to find a coding path of least resistance
> here.  Regardless of the timeline or acceptance in mainstream python
> for this feature, it is something I will have to patch in for our
> application.

The path with least resistance should be usage of 8.3 directory names.
The one to implement in future Python versions should be the rewrite
of import.c, to operate on PyObject* instead of char*, and perform
conversion to the native API only just before calling the native API.

Regards,
Martin