[Python-Dev] Python-3.0, unicode, and os.environ

Victor Stinner victor.stinner at haypocalc.com
Fri Dec 5 11:18:48 CET 2008


Hi,

Le Thursday 04 December 2008 21:02:19 Toshio Kuratomi, vous avez écrit :
> I opened up bug http://bugs.python.org/issue4006 a while ago and it was
> suggested in the report that it's not a bug but a feature and so I
> should come here to see about getting the feature changed :-)

Yeah, I prefer to discuss such changes on the mailing list.

> These mixed encodings can occur for a variety of reasons.  Here's an
> example that isn't too contrived :-)
> (...)
> Furthermore, they don't want to suffer from the space loss of using 
> utf-8 to encode Japanese so they use shift-jis everywhere.

"space loss"? Really? If you configure your server correctly, you should get 
UTF-8 even if the file system is Shift-JIS. But it would be much easier to 
use UTF-8 everywhere.

Hum... I don't think that the discussion is about one specific server, but the 
lack of bytes environment variables in Python3 :-)

> 1) return mixed unicode and byte types in ...

NO!

> 2) return only byte types in os.environ

Hum... Most users have UTF-8 everywhere (eg. all Windows users ;-)), and 
Python3 already use Unicode everywhere (input(), open(), filenames, ...).

> 3) silently ignore non-decodable value when accessing os.environ['PATH']
> as we do now but allow access to the full information via
> os.environ[b'PATH'] and os.getenvb()

I don't like os.environ[b'PATH']. I prefer to always get the same result 
type... But os.listdir() doesn't respect that :-(

   os.listdir(str) -> list of str
   os.listdir(bytes) -> list of bytes

I would prefer a similar API for easier migration from Python2/Python3
(unicode). os.environb sounds like the best choice for me.


But they are open questions (already asked in the bug tracker):

(a) Should os.environ be updated if os.environb is changed? If yes, how?
   os.environb['PATH'] = '\xff' (or any invalid string in the system 
                                 default encoding)
   => os.environ['PATH'] = ???

(b) Should os.environb be updated if os.environ is changed? If yes, how?

The problem comes with non-Unicode locale (eg. latin-1 or ASCII): most charset 
are unable to encode the whole Unicode charset (eg. codes >= 65535).

   os.environ['PATH'] = chr(0x10000)
   => os.environb['PATH'] = ???

(c) Same question when a key is deleted (del os.environ['PATH']).

If Python 3.1 will have os.environ and os.environb, I'm quite sure that some 
modules will user os.environ and other will prefer os.environb. If both 
environments are differents, the two modules set will work differently :-/

It would be maybe easier if os.environ supports bytes and unicode keys. But we 
have to keep these assertions:
   os.environ[bytes] -> bytes
   os.environ[str] -> str

> 4) raise an exception when non-decodable values are *accessed* and
> continue as in #3.

I like os.listdir() behaviour: just *ignore* non-decodable files. If you 
really want to access these files, use a bytes directory name ;-)

> I think that the ease of debugging is lost when we silently ignore an error.

Guido gave a good example. If your directory contains an non decodable 
filename (eg. "???.txt"): glob('*.py') will fail because of the evil 
filename. With the current behaviour, you're unable to list all files but 
glob('*.py') will list all Python scripts!

And Python3 is released, it's maybe a bad idea to change the behaviour (of 
os.environ) in Python 3.1 :-/

> The bug report I opened suggests creating a PEP to address this issue.

Please, try to answer to my questions about os.environ and os.environb 
consistency.

I also like bytes environment variables. I need them for my fuzzing program. 
The lack of bytes variables is a regression from Python2 (for my program). On 
UNIX, filenames are bytes and the environment variables are bytes. For the 
best interoperability, Python3 should support bytes. But the default choice 
should always be characters (unicode) and to never mix the bytes and str 
types ;-)

---

As usual, it goes faster if someone writes a patch :-) I could try to work on 
it.

-- 
Victor Stinner aka haypo
http://www.haypocalc.com/blog/


More information about the Python-Dev mailing list