[Patches] [ python-Patches-410465 ] Allow pre-encoded strings as filenames
noreply@sourceforge.net
noreply@sourceforge.net
Sun, 13 May 2001 01:08:54 -0700
Patches item #410465, was updated on 2001-03-21 21:02
You can respond by visiting:
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=410465&group_id=5470
Category: core (C code)
Group: None
>Status: Closed
>Resolution: Accepted
Priority: 5
Submitted By: Mark Hammond (mhammond)
Assigned to: Mark Hammond (mhammond)
Summary: Allow pre-encoded strings as filenames
Initial Comment:
This patch enables most filename parameters to use pre-
encoded strings. On Windows, the default of "mbcs" is
used. On all other platforms, the default filename
encoding is the same as the general default encoding,
which in reality means there is no functional change.
However, other platforms can simply plugin their own
encodings.
Rationalle: os.listdir() etc already return pre-
encoded strings on some platforms (notably Windows).
These pre-encoded strings may be used now for all
these functions - however, if you convert this encoded
string to a Unicode string, it can not be used to open
the file. This patch enables either a pre-encoded
string to work (as now) or a Unicode representation of
that same string (unlike now)
Things of note:
* I invented a new "Es" PyArg_ParseTuple marker. This
is very similar to "es", except it leaves string
objects alone assuming they are already encoded
correctly. "es" assumes a string in the default
encoding which it will then encode in the new
characterset - ie, a pre-encoded string fails here.
* This means that all affected functions have an extra
string copy. This copy still happens even when
strings are passed, and even on platforms where no
Unicode filesystem support exists. The only other
alternative was to make a much uglier patch, somehow
using string objects in-place, but converting and
freeing the buffer when Unicode. This could be done
if desired, but I'm not sure the added code complexity
is worth it.
* New method on win32: nt._getpathname(). This is
almost identical to win32api.GetPathName(), except it
handles encoded strings. ntpath.py has also been
changed to work with this. A hidden bonus of this
patch is that it will make os.abspath() work
identically regardless of the Win32 extensions being
installed.
* Tested on Linux, Windows 98 and Windows 2k. Still
working out how to build Python on my BeOs box :)
* New test for these semantics added.
----------------------------------------------------------------------
>Comment By: Mark Hammond (mhammond)
Date: 2001-05-13 01:08
Message:
Logged In: YES
user_id=14198
checked in:
Checking in Lib/ntpath.py;
new revision: 1.35; previous revision: 1.34
Checking in Lib/test/test_support.py;
new revision: 1.23; previous revision: 1.22
Checking in Lib/test/test_unicode_file.py;
initial revision: 1.1
Checking in Lib/test/output/test_unicode_file;
initial revision: 1.1
Checking in Modules/posixmodule.c;
new revision: 2.188; previous revision: 2.187
Checking in Python/bltinmodule.c;
new revision: 2.206; previous revision: 2.205
Checking in Python/getargs.c;
new revision: 2.56; previous revision: 2.55
----------------------------------------------------------------------
Comment By: Mark Hammond (mhammond)
Date: 2001-04-27 05:15
Message:
Logged In: YES
user_id=14198
MAL - please do! I generally look for the least-intrusive
patch when dealing with potentially contentious issues, but
I agree it makes more sense to rationalize.
----------------------------------------------------------------------
Comment By: M.-A. Lemburg (lemburg)
Date: 2001-04-27 00:54
Message:
Logged In: YES
user_id=38388
I like the idea of telling the arg parser to accept strings
as-is, but I think that copying all the code just to
implement the new "E" parser. Much easier would be switching
on the second marker
(behind the "e"), e.g. using "et" and "et#".
Do you want me to look into this ?
----------------------------------------------------------------------
Comment By: Mark Hammond (mhammond)
Date: 2001-03-22 14:10
Message:
Logged In: YES
user_id=14198
I appreciate it is too late for 2.1 for a change of this
size.
I don't think posixmodule is wrong - at least not how you
think :)
posix_rename calls:
return posix_2str(args, "EsEs:rename", rename);
however, it is posix_2str that passes the encoding, not
posix_rename itself. Ditto for posix_1str and
posix_do_stat.
----------------------------------------------------------------------
Comment By: Guido van Rossum (gvanrossum)
Date: 2001-03-22 13:45
Message:
Logged In: YES
user_id=6380
Mark, I don't think you expected to get this into 2.1, did
you? It's way too big.
Also, I think your patch to posixmodule.c has some bugs --
if I understand correctly, the format string "Es" requires
two arguments, the encoding and the address of the C string
pointer; but several functions (posix_rename and onwards)
don't pass the encoding name.
----------------------------------------------------------------------
Comment By: Mark Hammond (mhammond)
Date: 2001-03-21 21:04
Message:
Logged In: YES
user_id=14198
doh - forgot to click the checkbox
----------------------------------------------------------------------
You can respond by visiting:
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=410465&group_id=5470