Finding size of Variable

Steven D'Aprano steve+comp.lang.python at pearwood.info
Wed Feb 5 06:00:21 EST 2014


On Tue, 04 Feb 2014 21:35:05 -0800, Ayushi Dalmia wrote:

> On Wednesday, February 5, 2014 12:59:46 AM UTC+5:30, Tim Chase wrote:
>> On 2014-02-04 14:21, Dave Angel wrote:
>> 
>> > To get the "total" size of a list of strings,  try (untested):
>> 
>> > 
>> > a = sys.getsizeof (mylist )
>> > for item in mylist:
>> >     a += sys.getsizeof (item)
>> 
>> 
>> I always find this sort of accumulation weird (well, at least in
>> Python; it's the *only* way in many other languages) and would write
>> it as
>> 
>>   a = getsizeof(mylist) + sum(getsizeof(item) for item in mylist)
>> 
> 
> This also doesn't gives the true size. I did the following:


What do you mean by "true size"?

Do you mean the amount of space a certain amount of data will take in 
memory? With or without the overhead of object headers? Or do you mean 
how much space it will take when written to disk? You have not been clear 
what you are trying to measure.

If you are dealing with one-byte characters, you can measure the amount 
of memory they take up (excluding object overhead) by counting the number 
of characters: 23 one-byte characters requires 23 bytes. Plus the object 
overhead gives:

py> sys.getsizeof('a'*23)
44

44 bytes (23 bytes for the 23 single-byte characters, plus 21 bytes 
overhead). One thousand such characters takes:

py> sys.getsizeof('a'*1000)
1021

If you write such a string to disk, it will take 1000 bytes (or 1KB), 
unless you use some sort of compression.

> import sys
> data=[]
> f=open('stopWords.txt','r')
> 
> for line in f:
>     line=line.split()
>     data.extend(line)
> 
> print sys.getsizeof(data)

This will give you the amount of space taken by the list object. It will 
*not* give you the amount of space taken by the individual strings.

A Python list looks like this:


    | header | array of pointers |


The header is of constant or near-constant size; the array depends on the 
number of items in the list. It may be bigger than the list, e.g. a list 
with 1000 items might have allocated space for 2000 items. It will never 
be smaller.
 
getsizeof(list) only counts the direct size of that list, including the 
array, but not the things which the pointers point at. If you want the 
total size, you need to count them as well.


> where stopWords.txt is a file of size 4KB

My guess is that if you split a 4K file into words, then put the words 
into a list, you'll probably end up with 6-8K in memory.


-- 
Steven



More information about the Python-list mailing list