"convert" string to bytes without changing data (encoding)

Terry Reedy tjreedy at udel.edu
Wed Mar 28 14:11:28 EDT 2012



On 3/28/2012 11:36 AM, Ross Ridge wrote:
> Chris Angelico<rosuav at gmail.com>  wrote:
>> What is a string? It's not a series of bytes.
>
> Of course it is.  Conceptually you're not supposed to think of it that
> way, but a string is stored in memory as a series of bytes.

*If* it is stored in byte memory. If you execute a 3.x program mentally 
or on paper, then there are no bytes.

If you execute a 3.3 program on a byte-oriented computer, then the 'a' 
in the string might be represented by 1, 2, or 4 bytes, depending on the 
other characters in the string. The actual logical bit pattern will 
depend on the big versus little endianness of the system.

My impression is that if you go down to the physical bit level, then 
again there are, possibly, no 'bytes' as a physical construct as the 
bits, possibly, are stored in parallel on multiple ram chips.

> What he's asking for many not be very useful or practical, but if that's
> your problem here than then that's what you should be addressing, not
> pretending that it's fundamentally impossible.

The python-level way to get the bytes of an object that supports the 
buffer interface is memoryview(). 3.x strings intentionally do not 
support the buffer interface as there is not any particular 
correspondence between characters (codepoints) and bytes.

The OP could get the ordinal for each character and decide how *he* 
wants to convert them to bytes.

ba = bytearray()
for c in s:
   i = ord(c)
   <append bytes to ba corresponding to i>

To get the particular bytes used for a particular string on a particular 
system, OP should use the C API, possibly through ctypes.

-- 
Terry Jan Reedy




More information about the Python-list mailing list