[issue8622] Add PYTHONFSENCODING environment variable

Marc-Andre Lemburg report at bugs.python.org
Wed Aug 18 21:06:23 CEST 2010


Marc-Andre Lemburg <mal at egenix.com> added the comment:

STINNER Victor wrote:
> 
> STINNER Victor <victor.stinner at haypocalc.com> added the comment:
> 
>> The command line -h explanation is missing from the patch.
> 
> done
> 
>> The documentation should mention that the env var is only
>> read once; subsequent changes to the env var are not seen
>> by Python
> 
> I copied the PYTHONIOENCODING doc which doesn't mention that. Does Python re-read other environment variables at runtime? Anyway, I changed the doc to:
> 
> +   If this is set before running the intepreter, it overrides the encoding used
> +   for the filesystem encoding (see :func:`sys.getfilesystemencoding`).
> 
> I also changed PYTHONIOENCODING doc. Is it better?

Yes, thanks.

>> If the codec lookup fails, Python should either issue a warning
> 
> Ok, done. I patched also get_codeset() and get_codec_name() to always set a Python error.
> 
>> ... and then ignore the env var (using the get_codeset() API).
> 
> Good idea, done.
> 
>> Unrelated to the env var, but still important: if get_codeset()
>> does not return a known codec, Python should issue a warning
>> before falling back to the default setting. Otherwise, a
>> Python user will never know that there's an issue and this
>> make debugging a lot harder.
> 
> It does already write a message to stderr, but it doesn't explain why it failed.
> 
> I changed initfsencoding() to display two messages on get_codeset() error. First explain why get_codeset() failed (with the Python error) and then say that we fallback to utf-8.
> 
> Full example (PYTHONFSENCODING error and simulated get_codeset() error):
> ---
> PYTHONFSENCODING is not a valid encoding:
> LookupError: unknown encoding: xxx
> Unable to get the locale encoding:
> ValueError: CODESET is not set or empty
> Unable to get the filesystem encoding: fallback to utf-8
> ---

Looks good !

>> We should also add a new sys.setfilesystemencoding() ...
> 
> No, I plan to REMOVE this function. sys.setfilesystemencoding() is dangerous because it introduces a lot of inconsistencies: this function is unable to reencode all filenames in all objects (eg. Python is unable to find filenames in user objects or 3rd party libraries). Eg. if you change the filesystem from utf8 to ascii, it will not be possible to use existing non-ascii (unicode) filenames: they will raise UnicodeEncodeError. As sys.setdefaultencoding() in Python2, I think that sys.setfilesystemencoding() is the root of evil :-)

Sorry, I wasn't aware we had such a function (and was looking at the
wrong file so didn't find it).

> At startup, initfsencoding() sets the filesystem encoding using the locale encoding. Even for the startup process (with very few objects), it's very hard to find all filenames:
>  - sys.path
>  - sys.meta_path
>  - sys.modules
>  - sys.executable
>  - all code objects
>  - and I'm not sure that the list is complete
> 
> See #9630 for the details.
> 
> To remove sys.setfilesystemencoding(), I already patched PEP 383 tests (r84170) and I will open a new issue. But it's maybe better to commit both changes (remove the function and PYTHONFSENCODING) at the same time.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue8622>
_______________________________________


More information about the Python-bugs-list mailing list