[Patches] [ python-Patches-1552880 ] Unicode Imports

SourceForge.net noreply at sourceforge.net
Tue Sep 12 13:29:03 CEST 2006


Patches item #1552880, was opened at 2006-09-06 04:11
Message generated for change (Comment added) made by anthonybaxter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1552880&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core (C code)
Group: Python 2.6
Status: Open
Resolution: None
Priority: 5
Submitted By: Kristján Valur (krisvale)
Assigned to: Nobody/Anonymous (nobody)
Summary: Unicode Imports

Initial Comment:
This patch modifies the import mechanism to fully
support unicode pathnames on Windows.  It does this by
first converting each member of sys.path to utf-8. 
strings are encoded using the current locale.

The whole of the import logic is then unchanged and
works on the utf-8 strings as though they were regular
ascii strings in the current locale.

Only when file operations are done, such as stat() and
open(), do we then convert from utf-8 back  to unicode
and use the Windows unicode APIs for the job.  This is
also done when initializing Module objects.

This approach has the benefit of being of having a low
impact on the importing logic, and is thus easy to
verify.  There is however some overhead with the
conversions.

At CCP games we used this approach, backported to
python 2.3, to get unicode imports working for our
game, EVE Online, and thereby solving installation
issues in the far east.


This patch is submitted as demonstration code to the
python community.  I would like to see unicode fully
supported in 2.6.

Cheers,
Kristján

----------------------------------------------------------------------

>Comment By: Anthony Baxter (anthonybaxter)
Date: 2006-09-12 21:29

Message:
Logged In: YES 
user_id=29957

There's a variety of modules in the standard library that
reference __file__ - if it's potentially going to be a
unicode string, these are going to need to be checked, as
are their callers :-/

(Now that I've looked closer at some of the issues, I'm
extremely glad this didn't go into 2.5 final at this late stage)

----------------------------------------------------------------------

Comment By: Kristján Valur (krisvale)
Date: 2006-09-12 19:38

Message:
Logged In: YES 
user_id=1262199

I submitted this mostly as a demonstration.  I don't think
the approach is necessarily suitable for a final
implementation because of the use of utf-8 as an
intermediate representation and the price of the conversions
that keep happening.  But perhaps this is the way to go, if
we consider utf-8 to be a stage-1 default file system
encoding for win32.

I also agree that 4 is probably the most sensible approach.
 What about discrepancies between e.g. linux and windows
then, when including from a non-trivial path?  On linux we
would get utf-8, on windows unicode?

1) would actually make a lot of sense, only in my experience
this tends to lead to a kind of unicode-hell since a program
touched by one unicode object tends to have it percolating
down into every corner.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2006-09-09 22:31

Message:
Logged In: YES 
user_id=21627

First: Do you want to continue to work on this, or do you
consider this just "demonstration code" (i.e. not
contributed for inclusion in Python), hoping that somebody
else implements this feature?

I think the behavior of __file__ must be more consistent
across platforms, and the selected behaviour must be
documented somewhere. Several definitions of "consistent
behavior" come to mind:
1. __file__ is always a Unicode string
2. __file__ is a byte string if its ASCII, else Unicode
3. __file__ is a byte string if its in the system encoding,
else Unicode
4. __file__ is a byte string if its in the file system
encoding, else Unicode.

The documentation needs to be updated in several places,
e.g. also for inspect.getfile.

I would expect that pydoc would also need to be updated.

Selecting from the options above: I believe 4 is most
compatible with previous versions; 1 and 2 are most
convenient to work with in applications like pydoc which
have to generate HTML (1 is easier to work with, 2 is more
compatible with previous versions).


----------------------------------------------------------------------

Comment By: Kristján Valur (krisvale)
Date: 2006-09-09 21:38

Message:
Logged In: YES 
user_id=1262199

>From the top of my head, it is now unicode.  I consider
trying to convert it back to the default encoding but
decided not to to keep the patch brief.  

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2006-09-09 07:03

Message:
Logged In: YES 
user_id=21627

What is the value of the __file__ attribute of a module when
this patch is used?

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1552880&group_id=5470


More information about the Patches mailing list