How does Python get the value for sys.stdin.encoding?

Benjamin Kaplan benjamin.kaplan at case.edu
Wed Aug 11 22:24:49 EDT 2010


On Wed, Aug 11, 2010 at 6:21 PM, RG <rNOSPAMon at flownet.com> wrote:
> I thought it was hard-coded into the Python executable at compile time,
> but that is apparently not the case:
>
> [ron at mickey:~]$ python
> Python 2.6.1 (r261:67515, Feb 11 2010, 00:51:29)
> [GCC 4.2.1 (Apple Inc. build 5646)] on darwin
> Type "help", "copyright", "credits" or "license" for more information.
>>>> import sys;print sys.stdin.encoding
> UTF-8
>>>> ^D
> [ron at mickey:~]$ echo 'import sys;print sys.stdin.encoding' | python
> None
> [ron at mickey:~]$
>
> And indeed, trying to pipe unicode into Python doesn't work, even though
> it works fine when Python runs interactively.  So how can I make this
> work?
>

Sys.stdin and stdout are files, just like any other. There's nothing
special about them at compile time. When the interpreter starts, it
checks to see if they are ttys. If they are, then it tries to figure
out the terminal's encoding based on the environment. The code for
this is in pythonrun.c if you want to see exactly what it's doing. If
stdout and stdin aren't ttys, then their encoding stays as None and
the interpreter will use sys.getdefaultencoding() if you try printing
Unicode strings.

By the way, there is no such thing as piping Unicode into Python.
Unicode is an abstract concept where each character maps to a
codepoint. Pipes can only deal with bytes. You may be using one of the
5 encodings capable of holding the entire range of Unicode characters
(UTF-8, UTF-16 LE, UTF-16 BE, UTF-32 LE, and UTF-32 BE), but that's
not the same thing as Unicode. You really have to watch your encodings
when you pass data around between programs. There's no way to avoid
it.



More information about the Python-list mailing list