"convert" string to bytes without changing data (encoding)

Evan Driscoll driscoll at cs.wisc.edu
Wed Mar 28 15:20:50 EDT 2012


On 01/-10/-28163 01:59 PM, Ross Ridge wrote:
> Steven D'Aprano<steve+comp.lang.python at pearwood.info>  wrote:
>> The right way to convert bytes to strings, and vice versa, is via
>> encoding and decoding operations.
>
> If you want to dictate to the original poster the correct way to do
> things then you don't need to do anything more that.  You don't need to
> pretend like Chris Angelico that there's isn't a direct mapping from
> the his Python 3 implementation's internal respresentation of strings
> to bytes in order to label what he's asking for as being "silly".

That mapping may as well be:

   def get_bytes(some_string):
       import random
       length = random.randint(len(some_string), 5*len(some_string))
       bytes = [0] * length
       for i in xrange(length):
           bytes[i] = random.randint(0, 255)
       return bytes

Of course this is hyperbole, but it's essentially about as much 
guarantee as to what the result is.

As many others have said, the encoding isn't defined, and I would guess 
varies between implementations. (E.g. if Jython and IronPython use their 
host platforms' native strings, both have 16-bit chars and thus probably 
use UTF-16 encoding. I am not sure what CPython uses, but I bet it's 
*not* that.)

It's even guaranteed that the byte representation won't change! If 
something is lazily evaluated or you have a COW string or something, the 
bytes backing it will differ.


So yes, you can say that pretending there's not a mapping of strings to 
internal representation is silly, because there is. However, there's 
nothing you can say about that mapping.

Evan



More information about the Python-list mailing list