[Tutor] string to binary and back... Python 3

Thu Jul 19 20:56:11 CEST 2012

My response is down lower, thank you Wayne.

On 07/19/2012 12:52 PM, Wayne Werner wrote:
> I'll preface my response by saying that I know/understand fairly
> little about
> it, but since I've recently been smacked by this same issue when
> converting
> stuff to Python3, I'll see if I can explain it in a way that makes sense.
>
> On Wed, 18 Jul 2012, Jordan wrote:
>
>> OK so I have been trying for a couple days now and I am throwing in the
>> towel, Python 3 wins this one.
>> I want to convert a string to binary and back again like in this
>> question: Stack Overflow: Convert Binary to ASCII and vice versa
>> (Python)
>> <http://stackoverflow.com/questions/7396849/convert-binary-to-ascii-and-vice-versa-python>
>>
>> But in Python 3 I consistently get  some sort of error relating to the
>> fact that nothing but bytes and bytearrays support the buffer interface
>> or I get an overflow error because something is too large to be
>> converted to bytes.
>> Please help me and then explian what I am not getting that is new in
>> Python 3. I would like to point out I realize that binary, hex, and
>> encodings are all a very complex subject and so I do not expect to
>> master it but I do hope that I can gain a deeper insight. Thank you all.
>
> The way I've read it - stop thinking about strings as if they are
> text. The
> biggest reason that all this has changed is because Python has grown
> up and
> entered the world where Unicode actually matters. To us poor shmucks
> in the
> English speaking countries of the world it's all very confusing
> becaust it's
> nothing we have to deal with. 26 letters is perfectly fine for us -
> and if we
> want uppercase we'll just throw another 26. Add a few dozen puncuation
> marks
> and 256 is a perfectly fine amount of characters.
>
> To make a slightly relevant side trip, when you were a kid did you
> ever send
> "secret" messages to a friend with a code like this?
>
> A = 1
> B = 2
> .
> .
> .
> Z = 26
>
> Well, that's basically what is going on when it comes to
> bytes/text/whatever.
> When you input some text, Python3 believes that whatever you wrote was
> encoded
> with Unicode. The nice thing for us 26-letter folks is that the ASCII
> alphabet
> we're so used to just so happens to map quite well to Unicode
> encodings - so
> 'A' in ASCII is the same number as 'A' in utf-8.
>
> Now, here's the part that I had to (and still need to) wrap my mind
> around - if
> the string is "just bytes" then it doesn't really matter what the
> string is
> supposed to represent. It could represent the LATIN-1 character set. Or
> UTF-8, -16, or some other weird encoding. And all the operations that are
> supposed to modify these strings of bytes (e.g. removing spaces,
> splitting on a
> certain "character", etc.) still work. Because if I have this string:
>
> 9 45 12 9 13 19 18 9 12 99 102
>
> and I tell you to split on the 9's, it doesn't matter if that's some
> weird
> ASCII character, or some equally weird UTF character, or something else
> entirely. And I don't have to worry about things getting munged up
> when I try
> to stick Unicode and ASCII values together - because they're converted
> to bytes
> first.
>
> So the question is, of course, if it's all bytes, then why does it
> look like
> text when I print it out? Well, that's because Python converts that
> byte stream
> to Unicode text when it's printed. Or ASCII, if you tell it to.
>
> But Python3 has converted all(?) of those functions that used to
> operate on
> text and made them operate on byte streams instead. Except for the
> ones that
> operate on text ;)
>
>
>
> Well, I hope that's of some use and isn't too much of a lie - like I
> said, I'm
> still trying to wrap my head around things and I've found that
> explaining (or
> trying to explain) to someone else is often the best way to work out
> the idea
> in your own head. If I've gone too far astray I'm sure the other
> helpful folks
> here will correct me :)
>
Thank you for the vary informative post, every bit helps. It has
certainly been a challenge for me with the new everything is bytes
scheme, especially how everything has to be converted to bytes prior to
going on a buffer.
> HTH,
> Wayne