Diff between object graphs?
Cem Karan
cfkaran2 at gmail.com
Wed Apr 22 21:30:10 EDT 2015
On Apr 22, 2015, at 8:53 AM, Peter Otten <__peter__ at web.de> wrote:
> Cem Karan wrote:
>
>> Hi all, I need some help. I'm working on a simple event-based simulator
>> for my dissertation research. The simulator has state information that I
>> want to analyze as a post-simulation step, so I currently save (pickle)
>> the entire simulator every time an event occurs; this lets me analyze the
>> simulation at any moment in time, and ask questions that I haven't thought
>> of yet. The problem is that pickling this amount of data is both
>> time-consuming and a space hog. This is true even when using bz2.open()
>> to create a compressed file on the fly.
>>
>> This leaves me with two choices; first, pick the data I want to save, and
>> second, find a way of generating diffs between object graphs. Since I
>> don't yet know all the questions I want to ask, I don't want to throw away
>> information prematurely, which is why I would prefer to avoid scenario 1.
>>
>> So that brings up possibility two; generating diffs between object graphs.
>> I've searched around in the standard library and on pypi, but I haven't
>> yet found a library that does what I want. Does anyone know of something
>> that does?
>>
>> Basically, I want something with the following ability:
>>
>> Object_graph_2 - Object_graph_1 = diff_2_1
>> Object_graph_1 + diff_2_1 = Object_graph_2
>>
>> The object graphs are already pickleable, and the diffs must be, or this
>> won't work. I can use deepcopy to ensure the two object graphs are
>> completely separate, so the diffing engine doesn't need to worry about
>> that part.
>>
>> Anyone know of such a thing?
>
> A poor man's approach:
>
> Do not compress the pickled data, check it into version control. Getting the
> n-th state then becomes checking out the n-th revision of the file.
>
> I have no idea how much space you save that way, but it's simple enough to
> give it a try.
Sounds like a good approach, I'll give it a shot in the morning.
> Another slightly more involved idea:
>
> Make the events pickleable, and save the simulator only for every 100th (for
> example) event. To restore the 7531th state load pickle 7500 and apply
> events 7501 to 7531.
I was hoping to avoid doing this as I lose information. BUT, its likely that this will be the best approach regardless of what other methods I use; there is just too much data.
Thanks,
Cem Karan
More information about the Python-list
mailing list