[Python-Dev] Bytes for the command line, process arguments and environment variables
Victor Stinner
victor.stinner at haypocalc.com
Sat Jan 3 04:29:11 CET 2009
Hi,
Python 3.0 is released and supports unicode everywhere, great! But as pointed
by different people, bytes are required on non-Windows OS for backward
compatibility. This email is just a sum up all many issues/email threads.
Problems with Python 3.0:
(1) Invalid unicode string on the command line
=> some people wants to get the command line arguments as bytes
and so start even if non decodable unicode strings are present
on the command line
=> http://bugs.python.org/issue3023
(2) Non decodable environment variables are skipped in os.environ
=> Create os.environb (or anything else) to get these variables
as bytes (and be able to setup new variables as bytes)
=> Read the email thread "Python-3.0, unicode, and os.environ"
(Decembre 2008) opened by Toshio Kuratomi
(3) Support bytes for os.exec*() and subprocess.Popen(): process arguments
and the environment variables
=> http://bugs.python.org/issue4035: my patch for os.exec*()
=> http://bugs.python.org/issue4036: my patch for subprocess.Popen()
Command line
============
I like the curent behaviour and I don't want to change it. Be free to propose
a solution to solve the issue ;-)
Environment
===========
I already proposed "os.environb" which will have the similar API
than "os.environ" but with bytes. Relations between os.environb and
os.environ:
- for an undecodable variable value in os.environb, os.environ will raise
a KeyError. Example with utf8 charset and os.environb[b'PATH'] = '\xff':
path=os.environ['PATH'] will raise a KeyError to keep the current
behaviour.
- os.environ raises an UnicodeDecodeError if the key or value can not be
encoded in the current charset. Example with ASCII charset:
os.environ['PATH'] = '/home/hayp\xf4'
- except undecodable variable values in os.environb, os.environ and
os.environb will be consistent. Example: delete a variable in
os.environb will also delete the key in os.environ.
I think that most of these points (or all points) are ok for everyone
(especially ok for Toshio Kuratomi and me :-)).
Now I have to try to write an implementation of this, but it's complex,
especially to keep os.environ and os.environb consistents!
Processes
=========
I proposed patches to fix non-Windows OS, but Antoine Pitrou wants also bytes
on Windows. Amaury wrote that it's possible using the ANSI version of the
Windows API. I don't know this API and so I can not contribute to this point.
---
Rejected idea
=============
Use a private Unicode block causes interoperability problems:
- the block may be already used by other programs/libraires
- 3rd party programs/libraries don't understand this block and may
have problems this display/process the data
(Is the idea really rejected? It has at least many problems)
---
I don't have new solutions, it's just an email to restart the discussion about
bytes ;-) Martin also asked for a PEP to change the posix module API to
support bytes.
--
Victor Stinner aka haypo
http://www.haypocalc.com/blog/
More information about the Python-Dev
mailing list