Using python to delta-load files into a central DB

Chris Nethery gilcneth at earthlink.net
Thu Apr 12 13:05:15 EDT 2007


Hello everyone,

I have a challenging issue I need to overcome and was hoping I might gain 
some insights from this group.

I am trying to speed up the process I am using, which is as follows:

1) I have roughly 700 files that are modified throughout the day by users, 
within a separate application

2) As modifications are made to the files, I use a polling service and mimic 
the lock-file strategy used by the separate software application

3) I generate a single 'load' file and bulk insert into a load table

4) I update/insert/delete from the load table

This is just too time consuming, in my opinion.

At present, users of the separate application can run recalculation 
functions that modify all 700 files at once, causing my code to take the 
whole ball of wax, rather than just the data that has changed.

What I would like to do is spawn separate processes and load only the delta 
data.  The data must be 100% reliable, so I'm leary of using something like 
difflib.  I also want to make sure that my code scales since the number of 
files is ever-increasing.

I would be grateful for any feedback you could provide.


Thank you,

Chris Nethery 





More information about the Python-list mailing list