[Python-Dev] Low-Level Encoding Behavior on Python 3

Stefan Behnel stefan_ml at behnel.de
Wed Mar 16 17:11:21 CET 2011


Armin Ronacher, 16.03.2011 16:57:
> On 3/16/11 3:48 AM, Antoine Pitrou wrote:
>> I may be mistaken, but you seem to conflate two things: encoding of
>> file names, and encoding of file contents. I guess that virtualenv
>> chokes on the file contents, but most of your argument seems related to
>> encoding of file names (aka "filesystem encoding").
> These are two pretty unrelated problems but both are problems nonetheless.
> The filename encoding should not be guessed from the environment variables
> as those are from the connecting client. The default encoding for file
> contents also should not be platform dependent. It *will* lead to people
> thinking it works when in practice it will break if they move their code to
> a remote server and SSH into it and then trigger the code execution.
>
> I argue that the first is just wrong (filename encoding guessing) and the
> latter is dangerous (file content encoding being platform dependent).

Antoine was arguing that it's not the fault of CPython that virtualenv 
expects it to correctly guess the encoding of a file it wants to read. It 
tries an educated guess based on the current environment setup, and if 
that's not correctly configured, it's the user's fault. As you indicated 
yourself, it does work most of the time. That's all you should expect from 
a default.


> virtualenv itself is already fixed and explicitly tells it to read with
> UTF-8 encoding.

That's the right way to deal with encoded file content.

Stefan



More information about the Python-Dev mailing list