[issue9992] Command line arguments are not correctly decodediflocale and fileystem encodingsaredifferent

STINNER Victor report at bugs.python.org
Sun Oct 10 19:59:25 CEST 2010


STINNER Victor <victor.stinner at haypocalc.com> added the comment:

> > What? No. We have problems because we don't use the same encoding to
> > decode and to encode the same data type. It's not a problem to use a
> > different encoding for each data type (stdout, filenames, environment
> > variables, ...).
> 
> This is exactly the very problem that we face. In particular, the
> question is what encoding to use if something is *both* a filename
> and an environment variable value, or both a filename and a command
> line argument.

The question is: what is the best default encoding for a specific data type? 
There is no perfect answer (well, except maybe using byte strings :-)). Each 
solution has its own use cases and disadvantages.

If an application knows exactly the encoding of a data, and it is not the 
default encoding, it can still redecode the data. Using os.environb, it's a 
little bit better: the application just has to decode (don't have to encode 
and to know which encoding was used to decode initially the data). For 
sys.argv, I still want to create sys.argvb (bytes version) ;-)

For the command line arguments and environment variables, we don't have a lot 
of choices: locale or filesystem encodings. So Antoine and Martin: which 
encoding do you prefer? We should maybe try to find some use cases

Here is a dummy script bla.py:
---
import sys
print(sys.argv)
try:
    open(sys.argv[1]).close()
except Exception as err:
    print("open error: %s" % err)
else:
    print("open ok")
---

Locale encoding = FS encoding = utf-8:

$ ./python bla.py xxxé.txt 
['bla.py', 'xxxé.txt']
open ok

Locale encoding = utf8, FS encoding = ascii:

$ PYTHONFSENCODING=ascii ./python bla.py xxxé.txt 
['bla.py', 'xxxé.txt']
open error: 'ascii' codec can't encode character '\xe9' ...

The filename is displayed correctly, but we are unable to open the file if 
PYTHONFSENCODING is used :-/ Should the filename be displayed differently if 
PYTHONFSENCODING is used?

> I think these problems are sufficiently resolved now: either by
> PEP 3333, PEP 444, PEP 383, or os.environb.

Ok, cool :-)

> I think you misunderstood MAL's comment, though: the environment
> variables are not encoded in *any* specific encoding. Instead,
> they are copied literally from the HTTP request, using whatever
> bytes the browser originally put in there - which may or may
> not have followed a particular encoding. HTTP is silent on
> this most of the time, and HTML is out of scope.

Ah yes, thanks for you explaination. I was unable to find its comment.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue9992>
_______________________________________


More information about the Python-bugs-list mailing list