[Python-Dev] Python-3.0, unicode, and os.environ
Victor Stinner
victor.stinner at haypocalc.com
Fri Dec 5 19:20:59 CET 2008
Hi,
> > But they are open questions (already asked in the bug tracker):
>
> I answered these in the bug tracker. Here are the answers for the
> mailing list:
Oh, sorry. I didn't follow the end of the discussion on the bug tracker.
> > os.environb['PATH'] = '\xff'
> > => os.environ['PATH'] = ???
>
> os.environ['PATH'] => raises KeyError because PATH is not a key in
> the unicode decoded environment.
Ok, good answer :-)
> > os.environ['PATH'] = chr(0x10000)
> > => os.environb['PATH'] = ???
>
> raise UnicodeEncodeError when setting the value.
Ok, it's consistent the current behaviour.
$ LANG=C ./python
Python 3.0rc3+ (py3k:67498M, Dec 4 2008, 17:45:54)
>>> import os
>>> os.environ['x'] = '\xff'
>>> os.environ['x']
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/haypo/prog/py3k/Lib/io.py", line 1491, in write
b = encoder.encode(s)
File "/home/haypo/prog/py3k/Lib/encodings/ascii.py", line 22, in encode
return codecs.ascii_encode(input, self.errors)[0]
UnicodeEncodeError: 'ascii' codec can't encode character '\xff' in position 1:
ordinal not in range(128)
Oh, that's strange :-p The error is delayed when we read the value.
> > It would be maybe easier if os.environ supports bytes and unicode keys.
> > But we have to keep these assertions:
> > os.environ[bytes] -> bytes
> > os.environ[str] -> str
>
> I think the same choices have to be made here. If LANG=C, we still have
> to decide what to do when os.environ[str] is set to a non-ASCii string.
If the charset is US-ASCII, os.environ will drop non-ASCII values. But most
variables are ASCII only. Examples with my shell:
$ env
XCURSOR_THEME=kubuntu
LANG=fr_FR.UTF-8
EDITOR=vim
HOME=/home/haypo
...
> Additionally, the subprocess question makes using the key value
> undesirable compared with having a separate os.environb that accesses
> the same underlying data.
The user should be able to choose bytes or unicode. Examples:
- subprocess.Popen('ls') => use unicode environment (os.environ)
- subprocess.Popen(b'ls') => use bytes environment (os.environb)
> Here's my problem with it, though. With these semantics any program
> that works on arbitrary files and runs on *NIX has to check
> os.listdir(b'') and do the conversion manually.
Only programs that have to support strange environment like yours (mixing
Shift-JIS and UTF-8) :-) Most programs don't have to support these charset
mixture.
We can imagine an higher library working on UNIX and Windows (bytes or
Unicode). But that would be later.
> I think the desired behaviour assuming the existence of a nondecodable
> file is this:
I prefer the current behaviour :-)
> Why do you think that glob.glob('*.py') is special and should not traceback?
It's not special. glob() reuses listdir(), and it was an example to show
that "it just works".
> I just differ in that I think lack of tracebacks when
> UnicodeDecodeErrors are encountered is a wart in python3 that did not
> exist in python2.
Right.
--
Victor Stinner aka haypo
http://www.haypocalc.com/blog/
More information about the Python-Dev
mailing list