[issue9992] Command line arguments are not correctly decodedif locale and fileystem encodings aredifferent

Antoine Pitrou report at bugs.python.org
Sat Oct 9 11:45:19 CEST 2010


Antoine Pitrou <pitrou at free.fr> added the comment:

> Antoine: Python cannot possibly know whether a command line argument
> is meant as a file name or as some other text, and what encoding the
> receiving application will apply to it (if any).

I understand. But practicality seems to suggest that, most of the time,
non-ASCII arguments on a command line will be filenames. We should
probably try to favour the common case (barring implementation issues,
though, and it seems using the filesystem encoding in the interpreter
bootup phase is not easy).

> So perhaps it would be best if Python had two external default
> encodings: the IO one (command line arguments, environment variables,
> text files), and the file name encoding (defaulting to the IO encoding
> if not set).

Looking at environment variables here, they seem to be either:
- integers (pids, port numbers...)
- conventional variables (such as "fr_FR.utf8")
- usernames
- file paths

The most likely values to be non-ASCII are, therefore, file paths. So it
would make sense to also use the filesystem encoding for environment
variables (so as to satisfy the common case).

As for text files, I agree it's different, and the encoding choice
routine in TextIOWrapper already favours locale.getpreferredencoding()
and ignores the filesystem encoding.

> If we have tests that rely on the fsname encoding and the IO encoding
> being the same, then those tests should get skipped if the encodings
> are actually different.

Agreed, but only when this discussion has come to a conclusion :)

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue9992>
_______________________________________


More information about the Python-bugs-list mailing list