[issue8514] Create fsencode() and fsdecode() functions in os.path
STINNER Victor
report at bugs.python.org
Fri Apr 30 18:05:29 CEST 2010
STINNER Victor <victor.stinner at haypocalc.com> added the comment:
Le vendredi 30 avril 2010 15:58:28, vous avez écrit :
> It's better to let the application decide how to solve this problem
> and in order to allow for this, the encodings must be adjustable.
On POSIX, use byte strings to avoid encoding issues. Examples:
subprocess.call(['env'], {b'TEST: b'a\xff-'}) # env
subprocess.call(['echo', b'a\xff-']) # command line
open('a\xff-') # filename
os.getenv(b'a\xff-') # get env (result as unicode)
Are you talking about issues on Windows?
> By using fsencode() and fsdecode() in stdlib functions, you basically
> prevent this kind of adjustment, ...
Not if you use byte strings. On POSIX, an unicode string is always converted
at the end for the system call (using sys.getfilesystemencoding()).
> If you know that e.g. your environment variables are going to have
> Latin-1 data (say some content-type variable has this information),
> but the user's default LANG setting is UTF-8, Python will fetch the
> data as broken Unicode data, you then have to convert it back to bytes
> and then back to Unicode using the correct Latin-1 encoding.
>
> It would be a lot better to have the application provide the
> encoding to the os.getenv() function and have Python do the
> correct decoding right from the start.
You mean that os.getenv() should have an optionnal argument? Something like:
def getenv(key, default=None, encoding=None):
value = environ.get(key, default)
if encoding:
value = value.encode(sys.getfileystemencoding(), 'surrogateescape')
value = value.decode(encoding, 'surrogateescape')
return value
There are many indirect calls to os.getenv() (eg. by using os.environ.get()):
- curses uses TERM
- webbrowser uses PROGRAMFILES (path)
- distutils.msvc9compiler uses "VS%0.f0COMNTOOLS" % version (path)
- wsgiref.util uses HTTP_HOST, SERVER_NAME, SCRIPT_NAME, ... (url)
- platform uses PROCESSOR_ARCHITEW6432
- sysconfig uses PYTHONUSERBASE, APPDATA, ... (path)
- idlelib.PyShell uses IDLESTARTUP and PYTHONSTARTUP (path)
- ...
How would you specify the correct encoding in indirect calls?
If your application gets variables in *mixed* encoding, I think that your
program should start by reencoding variables:
for name, encoding in (('PATH', 'latin1'), ...):
value = os.getenv(name)
value = value.encode(sys.getfileystemencoding(), 'surrogateescape')
value = value.decode(encoding, 'surrogateescape')
os.setenv(name, value)
----------
_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue8514>
_______________________________________
More information about the Python-bugs-list
mailing list