How to instantiate in a lazy way?

Slaunger Slaunger at gmail.com
Mon Dec 1 08:40:22 EST 2008


Hi comp.lang.python,

I am a novice Python programmer working on a project where I deal with
large binary files (>50 GB each)
consisting of a series of variable sized data packets.

Each packet consists of a small header with size and other information
and a much large payload containing the actual data.

Using Python 2.5, struct and numpy arrays I am capable of parsing such
a file quite efficiently into Header and Payload objects which I then
manipulate in various ways.

The most time consuming part of the parsing is the conversion of a
proprietary form of 32 bit floats into the IEEE floats used internally
in Python in the payloads.

For many use cases I am actually not interested in doing the parsing
of the payload right when I pass through it, as I may want to use the
attributes of the header to select the 1/1000 payload which I actually
have to look into the data for and do the resourceful float
conversion.

I would therefore like to have two variants of a Payload class. One
which is instantiated right away with the payload being parsed up in
the float arrays available as instance attributes and another variant,
where the Payload object at the time of instantiation only contains a
pointer to the place (f.tell()) in file where the payload begins. Only
when the non-existing attribute for a parsed up module is actully
accessed should the data be read, parsed up and the attribute created.

In pseudocode:

class PayloadInstant(object):
    """
   This is a normal Payload, where the data are parsed up when
instantiated
    """

    @classmethod
    def read_from_file(cls, f, size):
        """
        Returns a PayloadInstant instance with float data parsed up
        and immediately accessible in the data attribute.
Instantiation
        is slow but after instantiation, access is fast.
        """

    def __init___(self, the_data):
        self.data = the_data

class PayloadOnDemand(object):
    """
    Behaves as a PayloadInstant object, but instantiation is faster
    as only the position of the payload in the file is stored
initially in the object.
    Only when acessing the initially non-existing data attribute
    are the data actually read and the attribure created and bound to
the instance.
    This will actually be a little slower than in PayloadInstant as
the correct file position
    has to be seeked out first.
    On later calls the object has as efficient attribute access as
PayloadInstant
    """

    @classmethod
    def read_from_file(cls, f, size):
        pos = f.tell()
        f.seek(pos + size) #Skip to end of payload
        return cls(pos)

    # I probably need some __getattr__ or __getattribute__ magic
here...??

    def __init__(self, a_file_position):
        self.file_position = a_file_position

My question is this a a pyhtonic way to do it, and they I would like a
hint as to how to make the hook
inside the PayloadOnDemand class, such that the inner lazy creation of
the attribute is completely hidden from the outside.

I guess I could also just make a single class, and let an OnDemand
attribute decide how it should behave.

My real application is considerably more complicated than this, but I
think the example grasps the problem in a nutshell.

-- Slaunger



More information about the Python-list mailing list