How to turn a string into a list of integers?

Chris Angelico rosuav at gmail.com
Sun Sep 7 11:04:34 EDT 2014


On Mon, Sep 8, 2014 at 12:52 AM, MRAB <python at mrabarnett.plus.com> wrote:
> I don't think you should be saying that it stores the string in Latin-1
> or UTF-16 because that might suggest that they are encoded. They aren't.

Except that they are. RAM stores bytes [1], so by definition
everything that's in memory is encoded. You can't store a list in
memory; what you store is a set of bits which represent some metadata
and a bunch of pointers. You can't store a non-integer in memory, so
you use some kind of efficient packed system like IEEE 754. You can't
even store an integer without using some kind of encoding, most likely
by packing it into some number of bytes and laying those bytes out
either smallest first or largest first. So yes, CPython 3.3 stores
strings encoded Latin-1, UCS-2 [2], or UCS-4. The Python string *is* a
sequence of characters, but it's *stored* as a sequence of bytes in
one of those encodings. (And other Pythons may not use the same
encodings. MicroPython uses UTF-8 internally, which gives it *very*
different indexing performance.)

ChrisA

[1] On modern systems it stores larger units, probably 64-bit or
128-bit hunks, but whatever. Same difference.
[2] As Steven says, UTF-16 or UCS-2. I prefer the latter name here; as
it (like Latin-1) is restricted in character set rather than variable
in length. But same thing.



More information about the Python-list mailing list