Finding size of Variable

Tim Chase python.list at tim.thechases.com
Mon Feb 10 09:43:08 EST 2014


On 2014-02-10 06:07, wxjmfauth at gmail.com wrote:
> Python does not save memory at all. A str (unicode string)
> uses less memory only - and only - because and when one uses
> explicitly characters which are consuming less memory.
> 
> Not only the memory gain is zero, Python falls back to the
> worse case.
> 
> >>> sys.getsizeof('a' * 1000000)  
> 1000025
> >>> sys.getsizeof('a' * 1000000 + 'oe')  
> 2000040
> >>> sys.getsizeof('a' * 1000000 + 'oe' + '\U00010000')  
> 4000048

If Python used UTF-32 for EVERYTHING, then all three of those cases
would be 4000048, so it clearly disproves your claim that "python
does not save memory at all".

> The opposite of what the utf8/utf16 do!
> 
> >>> sys.getsizeof(('a' * 1000000 + 'oe' +
> >>> '\U00010000').encode('utf-8'))  
> 1000023
> >>> sys.getsizeof(('a' * 1000000 + 'oe' +
> >>> '\U00010000').encode('utf-16'))  
> 2000025

However, as pointed out repeatedly, string-indexing in fixed-width
encodings are O(1) while indexing into variable-width encodings (e.g.
UTF8/UTF16) are O(N).  The FSR gives the benefits of O(1) indexing
while saving space when a string doesn't need to use a full 32-bit
width.

-tkc






More information about the Python-list mailing list