chunking a long string?
Steven D'Aprano
steve+comp.lang.python at pearwood.info
Fri Nov 8 19:46:32 EST 2013
On Fri, 08 Nov 2013 12:43:43 -0800, wxjmfauth wrote:
> "(say, 1 kbyte each)": one "kilo" of characters or bytes?
>
> Glad to read some users are still living in an ascii world, at the
> "Unicode time" where an encoded code point size may vary between 1-4
> bytes.
>
>
> Oops, sorry, I'm wrong,
That part is true.
> it can be much more.
That part is false. You're measuring the overhead of the object
structure, not the per-character storage. This has been the case going
back since at least Python 2.2: strings are objects, and have overhead.
>>>> sys.getsizeof('ab')
> 27
27 bytes for two characters! Except it isn't, it's actually 25 bytes for
the object header and two bytes for the two characters.
>>>> sys.getsizeof('a\U0001d11e')
> 48
And here you have four bytes each for the two characters and a 40 byte
header. Observe:
py> c = '\U0001d11e'
py> len(c)
1
py> sys.getsizeof(2*c) - sys.getsizeof(c)
4
py> sys.getsizeof(1000*c) - sys.getsizeof(999*c)
4
How big is the object overhead on a (say) thousand character string? Just
one percent:
py> (sys.getsizeof(1000*c) - 4000)/4000
0.01
--
Steven
More information about the Python-list
mailing list