Finding size of Variable

Ayushi Dalmia ayushidalmia2604 at gmail.com
Tue Feb 4 07:43:21 EST 2014


On Tuesday, February 4, 2014 5:10:25 PM UTC+5:30, Peter Otten wrote:
> Ayushi Dalmia wrote:
> 
> 
> 
> > I have 10 files and I need to merge them (using K way merging). The size
> 
> > of each file is around 200 MB. Now suppose I am keeping the merged data in
> 
> > a variable named mergedData, I had thought of checking the size of
> 
> > mergedData using sys.getsizeof() but it somehow doesn't gives the actual
> 
> > value of the memory occupied.
> 
> > 
> 
> > For example, if a file in my file system occupies 4 KB of data, if I read
> 
> > all the lines in a list, the size of the list is around 2100 bytes only.
> 
> > 
> 
> > Where am I going wrong? What are the alternatives I can try?
> 
> 
> 
> getsizeof() gives you the size of the list only; to complete the picture you 
> 
> have to add the sizes of the lines.
> 
> 
> 
> However, why do you want to keep track of the actual memory used by 
> 
> variables in your script? You should instead concentrate on the algorithm, 
> 
> and as long as either the size of the dataset is manageable or you can limit 
> 
> the amount of data accessed at a given time you are golden.

As I said, I need to merge large files and I cannot afford more I/O operations. So in order to minimise the I/O operation I am writing in chunks. Also, I need to use the merged files as indexes later which should be loaded in the memory for fast access. Hence the concern.

Can you please elaborate on the point of taking lines into consideration?



More information about the Python-list mailing list