What is file.encoding convention?

Vinay Sajip vinay_sajip at yahoo.co.uk
Thu Jul 23 11:39:50 EDT 2009


On Jul 23, 4:06 am, Naoki INADA <songofaca... at gmail.com> wrote:
> In document <http://docs.python.org/library/
> stdtypes.html#file.encoding>:
>
> >> The encoding that this file uses. When Unicode strings are written to a file,
> >>  they will be converted to byte strings using this encoding. In addition,
> >> when the file is connected to a terminal, the attribute gives the encoding
> >> that the terminal is likely to use
>
> But inlogging.StreamHandler.emit() ::
>
>                 try:
>                     if (isinstance(msg, unicode) and
>                         getattr(stream, 'encoding', None)):
>                         #fs = fs.decode(stream.encoding)
>                         try:
>                             stream.write(fs % msg)
>                         except UnicodeEncodeError:
>                             #Printing to terminals sometimes fails.
> For example,
>                             #with an encoding of 'cp1251', the above
> write will
>                             #work if written to a stream opened or
> wrapped by
>                             #the codecs module, but fail when writing
> to a
>                             #terminal even when the codepage is set to
> cp1251.
>                             #An extra encoding step seems to be
> needed.
>                             stream.write((fs % msg).encode
> (stream.encoding))
>                     else:
>                         stream.write(fs % msg)
>                 except UnicodeError:
>                     stream.write(fs % msg.encode("UTF-8"))
>
> And behavior of sys.stdout in Windows::>>> import sys
> >>> sys.stdout.encoding
> 'cp932'
> >>> u = u"あいう"
> >>> u
>
> u'\u3042\u3044\u3046'>>> print >>sys.stdout, u
> あいう
> >>> sys.stderr.write(u)
>
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> UnicodeEncodeError: 'ascii' codec can't encode characters in position
> 0-2: ordinal not in range(128)
>
> What is file.encoding convention?
> If I want to write a unicode string to a file(-like) that have
> encoding attribute, I should do
> (1) try: file.write(unicode_str),
> (2) except UnicodeEncodeError: file.write(unicode_str.encode
> (file.encoding))
> likelogging?
> It seems agly.

If you are writing a Unicode string to a stream which has been opened
with e.g. codecs.open with a specific encoding, then the stream is
actually a wrapper. You can write Unicode strings directly to it, and
the wrapper stream will encode the Unicode to bytes using the specific
encoding and write those bytes to the underlyting stream. In your
example you didn't show sys.stderr.encoding - you showed
sys.stdout.encoding and printed out something to it which seemed to
give the correct result, but then wrote to sys.stderr which gave a
UnicodeEncodeError. What is the encoding of sys.stderr in your
example? Also note that logging had to handle what appeared to be an
oddity with terminals - they (at least sometimes) have an encoding
attribute but appear to expect to have bytes written to them, and not
Unicode. Hence the logging kludge, which should not be needed and so
has been carefully commented.

Regards,

Vinay Sajip



More information about the Python-list mailing list