python 2.7 and unicode (one more time)

Chris Angelico rosuav at gmail.com
Thu Nov 20 11:46:24 EST 2014


On Fri, Nov 21, 2014 at 3:32 AM, Peter Otten <__peter__ at web.de> wrote:
> Chris Angelico wrote:
>
>> On Fri, Nov 21, 2014 at 2:40 AM, Peter Otten <__peter__ at web.de> wrote:
>>> I think that you may get a Unicode/Encode/Error when you try to /decode/
>>> a unicode string is more confusing...
>>
>> Hang on a minute, what does it even mean to decode a Unicode string?
>
> Let's not get philosophical ;)

No, I'm quite serious. You encode Unicode text into bytes; you decode
bytes into text. You can also encode a floating-point value into
bytes, and decode bytes into a float. Or you could encode a large and
complex structure into bytes, using something like pickle or json, and
then decode those bytes later on. The pattern is always the same: the
abstract object with meaning to a human is encoded into a concrete
form that a computer can handle, and the concrete is decoded into the
abstract. If you're not good at sight-reading sheet music, you'll have
the same feeling of staring at the dots, decoding them one by one into
this abstract thing called "music", and then being able to work with
it.

When you try to decode a Unicode string, what happens is that Python 2
says "Oh, you're trying to do a byte-string operation on a Unicode
string... I'll quickly encode that to bytes for you, then do what you
asked". That's why you can get an *en*coding error when you asked to
*de*code - because both operations have to happen.

ChrisA



More information about the Python-list mailing list