[Numpy-discussion] Saving an array on disk to free memory - Pickling

Jean-Baptiste Rudant boogaloojb at yahoo.fr
Mon May 17 07:03:19 EDT 2010


Hello,

I tried to create an object :
- which behave just like a numpy array ;
- which can be saved on disk in an efficient way (numpy.save in my example but with pytables in my real program) ;
- which can be "unloaded" (if it is saved) to free memory : it can exsit has an empty stuff which knows how to retrieve real values ; it will be loaded only when we need to work with it ;
- which unloads itself before being pickled (values are already saved and don't have to be pickled).

It can't, at least I think so, inherit from ndarray because sometimes (for example juste after being unpickled and before being used) it is juste an empty shell.
I don't think memmap can be helpful (I want to use pytables to save it on disk and I want it to be flexible : if I use it in a temporary way, I just need it in memory and I will never save it on disk).

My problems are :
- this code is ugly ;
- I have to define explicitely all special methods (__add__, __mul__...) of ndarrays because:
 * __getattr__ don't retrieve them ;
 * even if it does, I have to define explicitely the type of the return value (if I well understand, if it inherits from ndarray __array_wrap__ do all the stuff).

Thank you for the help.

Regards.

import numpy

import numpy

class PersistentArray(object):
    def __init__(self, values):
        '''
        values is a numpy array
        '''
        self.values = values
        self.filename = None
        self.is_loaded = True
        self.is_saved = False
        
    def save(self, filename):
        self.filename = filename
        numpy.save(self.filename, self.values)
        self.is_saved = True
        
    def load(self):
        self.values = numpy.load(self.filename)
        self.is_loaded = True
    
    def unload(self):
        if not self.is_saved:
            raise Exception, "PersistentArray must be saved before being unloaded"
        del self.values
        self.is_loaded = False
        
    def __getitem__(self, index):
        return self.values[index]
        
    def __getattr__(self, key):
        if key == 'values':
            if not self.is_loaded:
                self.load()
            return self.values
        elif key == '__array_interface__':
            #I can't remember why I wrote this code, but I think it's necessary to make pickling work properly
            raise AttributeError, key
        else:
            try:
                #to emulate ndarray inheritance
                return self.values.__getattribute__(key)
            except AttributeError:
                raise AttributeError, key
            
    def __setstate__(self, dict):
        self.__dict__.update(dict)
        if self.is_loaded and self.is_saved:
            self.load()
        
    def __getstate__(self):
        if not self.is_saved:
            raise Exception, "persistent array must be saved before being pickled"
        odict = self.__dict__.copy()
        if self.is_saved:
            if self.is_loaded:
                odict['is_loaded'] = False
                del odict['values']
        return odict

filename = 'persistent_test.npy'

a = PersistentArray(numpy.arange(10e6))
a.save(filename)
a.sum()
a.unload() # a still exists, knows how to retrieve values if needed, but don't use space in memory


      
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20100517/08c54eed/attachment.html>


More information about the NumPy-Discussion mailing list