save tuple of simple data types to disk (low memory foot print)

Steven D'Aprano steve+comp.lang.python at pearwood.info
Fri Oct 28 21:00:17 EDT 2011


On Fri, 28 Oct 2011 22:47:42 +0200, Gelonida N wrote:

> Hi,
> 
> I would like to save many dicts with a fixed amount of keys tuples to a
> file  in a memory efficient manner (no random, but only sequential
> access is required)

What do you call "many"? Fifty? A thousand? A thousand million? How many 
items in each dict? Ten? A million?

What do you mean "keys tuples"?


> As the keys are the same for each entry  I considered converting them to
> tuples.

I don't even understand what that means. You're going to convert the keys 
to tuples? What will that accomplish?


> The tuples contain only strings, ints (long ints) and floats (double)
> and the data types for each position within the tuple are fixed.
> 
> The fastest and simplest way is to pickle the data or to use json. Both
> formats however are not that optimal.

How big are your JSON files? 10KB? 10MB? 10GB?

Have you tried using pickle's space-efficient binary format instead of 
text format? Try using protocol=2 when you call pickle.Pickler.

Or have you considered simply compressing the files?


> I could store ints and floats with pack. As strings have variable length
> I'm not sure how to save them efficiently (except adding a length first
> and then the string.

This isn't 1980 and you're very unlikely to be using 720KB floppies. 
Premature optimization is the root of all evil. Keep in mind that when 
you save a file to disk, even if it contains only a single bit of data, 
the actual space used will be an entire block, which on modern hard 
drives is very likely to be 4KB. Trying to compress files smaller than a 
single block doesn't actually save you any space.


> Is there already some 'standard' way or standard library to store such
> data efficiently?

Yes. Pickle and JSON plus zip or gzip.


-- 
Steven



More information about the Python-list mailing list