[Python-Dev] [Python-3000] Proposed Python 3.0 schedule

James Y Knight foom at fuhm.net
Tue Oct 7 05:22:09 CEST 2008


On Oct 6, 2008, at 8:52 PM, Benjamin Peterson wrote:
> I'm not sure we do. Correct me if I'm wrong, but the "big ticket",
> issue bytes/unicode filepaths, has been resolved. And looking at the
> tracker, I only see 18 release blockers.


Well, if you mean that the resolution decided upon is to "simply"  
allow access to all system APIs using either byte or unicode strings,  
then it seems to me that there's a rather large amount of work left to  
do...

Here's some I found from a few minutes of futzing around with r66821  
of py3k on Linux.

  - Having os.getcwdb isn't much use when you can't even run python in  
the first place when the current directory has "bad" bytes in it.

Currently Python outputs:
Could not find platform independent libraries <prefix>
Could not find platform dependent libraries <exec_prefix>
Consider setting $PYTHONHOME to <prefix>[:<exec_prefix>]
Fatal Python error: Py_Initialize: can't initialize sys standard streams
ImportError: No module named encodings.utf_8
Aborted

  - I'd think "find . -type f -print0 | xargs -0 python -c 'pass'"  
ought to work (with files with "bad" bytes being returned by find),  
which means that Python shouldn't blow up and refuse to start when  
there's a non-properly-encoding argv ("Could not convert argument 1 to  
string" and exiting isn't appropriate behavior).

  - Of course, just being able to start the interpreter isn't quite  
enough: you'll want to be able to access that argument list too,  
somehow (add sys.argvb?).

  - And then, getopt and optparse modules should work on bytestring  
vectors, so that you can use sys.argvb without writing your own  
argument parser. They don't currently.

  - There's no os.environb for bytewise access to the environment.  
Seems important.

  - Isn't it a potential security issue that " 'WHATEVER' in  
os.environ" can return False if WHATEVER had some "bad" bytes in it,  
but spawning a subprocess actually will include WHATEVER in the  
subprocess's environment? Actually, even better: the behavior depends  
on whether you use subprocess.call('foo') or subprocess.call('foo',  
os.environ). The first passes through the "bad" environment variables,  
while the second does not. A bit surprising, perhaps.

  - Shouldn't this work?
   subprocess.call(b'/bin/echo')
Currently raises an exception:
AttributeError: 'int' object has no attribute 'rfind'

  - I suppose sys.path should handle bytestrings on the path, and  
should be populated using the bytes-version of os.environ so that  
PYTHONPATH gets read in properly. Which of course implies that all the  
importers need to handle byte filenames.

  - zipfile.ZipFile(b'whatever.zip') doesn't work.

  - zipfile decodes/encodes the filenames inside the zip file to  
unicode, so thus can only handle correctly encoded filenames.

I'm sure there's even more APIs dealing with pathnames, command line  
arguments, or environment variables that ought to be able to handle  
both bytes and strings, that currently don't.

James


More information about the Python-Dev mailing list