[Python-Dev] Bytes for the command line, process arguments and environment variables

Victor Stinner victor.stinner at haypocalc.com
Sat Jan 3 04:29:11 CET 2009


Hi,

Python 3.0 is released and supports unicode everywhere, great! But as pointed 
by different people, bytes are required on non-Windows OS for backward 
compatibility. This email is just a sum up all many issues/email threads.

Problems with Python 3.0:

 (1) Invalid unicode string on the command line
   => some people wants to get the command line arguments as bytes
      and so start even if non decodable unicode strings are present
      on the command line
   => http://bugs.python.org/issue3023

 (2) Non decodable environment variables are skipped in os.environ
   => Create os.environb (or anything else) to get these variables
      as bytes (and be able to setup new variables as bytes)
   => Read the email thread "Python-3.0, unicode, and os.environ" 
      (Decembre 2008) opened by Toshio Kuratomi

 (3) Support bytes for os.exec*() and subprocess.Popen(): process arguments 
   and the environment variables
   => http://bugs.python.org/issue4035: my patch for os.exec*()
   => http://bugs.python.org/issue4036: my patch for subprocess.Popen()


Command line
============

I like the curent behaviour and I don't want to change it. Be free to propose 
a solution to solve the issue ;-)


Environment
===========

I already proposed "os.environb" which will have the similar API 
than "os.environ" but with bytes. Relations between os.environb and 
os.environ:

  - for an undecodable variable value in os.environb, os.environ will raise
    a KeyError. Example with utf8 charset and os.environb[b'PATH'] = '\xff':
    path=os.environ['PATH'] will raise a KeyError to keep the current
    behaviour.

  - os.environ raises an UnicodeDecodeError if the key or value can not be
    encoded in the current charset. Example with ASCII charset:
    os.environ['PATH'] = '/home/hayp\xf4'

  - except undecodable variable values in os.environb, os.environ and
    os.environb will be consistent. Example: delete a variable in 
    os.environb will also delete the key in os.environ.

I think that most of these points (or all points) are ok for everyone 
(especially ok for Toshio Kuratomi and me :-)).

Now I have to try to write an implementation of this, but it's complex, 
especially to keep os.environ and os.environb consistents!


Processes
=========

I proposed patches to fix non-Windows OS, but Antoine Pitrou wants also bytes 
on Windows. Amaury wrote that it's possible using the ANSI version of the 
Windows API. I don't know this API and so I can not contribute to this point.

---

Rejected idea
=============

Use a private Unicode block causes interoperability problems:
 - the block may be already used by other programs/libraires
 - 3rd party programs/libraries don't understand this block and may
   have problems this display/process the data

(Is the idea really rejected? It has at least many problems)

---

I don't have new solutions, it's just an email to restart the discussion about 
bytes ;-) Martin also asked for a PEP to change the posix module API to 
support bytes.

-- 
Victor Stinner aka haypo
http://www.haypocalc.com/blog/


More information about the Python-Dev mailing list