Diff between object graphs?

Wed Apr 22 08:53:38 EDT 2015

Cem Karan wrote:

> Hi all, I need some help.  I'm working on a simple event-based simulator
> for my dissertation research. The simulator has state information that I
> want to analyze as a post-simulation step, so I currently save (pickle)
> the entire simulator every time an event occurs; this lets me analyze the
> simulation at any moment in time, and ask questions that I haven't thought
> of yet.  The problem is that pickling this amount of data is both
> time-consuming and a space hog.  This is true even when using bz2.open()
> to create a compressed file on the fly.
> 
> This leaves me with two choices; first, pick the data I want to save, and
> second, find a way of generating diffs between object graphs.  Since I
> don't yet know all the questions I want to ask, I don't want to throw away
> information prematurely, which is why I would prefer to avoid scenario 1.
> 
> So that brings up possibility two; generating diffs between object graphs.
>  I've searched around in the standard library and on pypi, but I haven't
> yet found a library that does what I want.  Does anyone know of something
> that does?
> 
> Basically, I want something with the following ability:
> 
> Object_graph_2 - Object_graph_1 = diff_2_1
> Object_graph_1 + diff_2_1 = Object_graph_2
> 
> The object graphs are already pickleable, and the diffs must be, or this
> won't work.  I can use deepcopy to ensure the two object graphs are
> completely separate, so the diffing engine doesn't need to worry about
> that part.
> 
> Anyone know of such a thing?

A poor man's approach:

Do not compress the pickled data, check it into version control. Getting the 
n-th state then becomes checking out the n-th revision of the file.

I have no idea how much space you save that way, but it's simple enough to 
give it a try.

Another slightly more involved idea:

Make the events pickleable, and save the simulator only for every 100th (for 
example) event. To restore the 7531th state load pickle 7500 and apply 
events 7501 to 7531.