Using python to delta-load files into a central DB

Chris Nethery gilcneth at earthlink.net
Thu Apr 12 22:51:22 EDT 2007


Gabriel,

Thank you for your reply.

Yes, they are tab-delimited text files that will change very little 
throughout the day.

But, this is messy, antiquated 80s junk, nonetheless.

Rows are designated either by a row type or they contain a comment.  Each 
row type has an identity value, but the 'comment' rows do not.  The comment 
rows, however, are logically associated with the last occurring row type. 
When I generate my bulk insert file, I add the identity of the last 
occurring row type to the comment rows, and generate and populate an 
additional identity column in order to retain the order of the comments.

Generally rows will either be added or changed, but sometimes rows will be 
removed.  Typically, only 1-5 new rows will be added to a file in a given 
day, but users sometimes make manual corrections/deletions to older rows and 
sometimes certain column values are recalculated.

Did I mention that the header contains another implied hierarchy? 
Fortunately, I can just ignore it and strip it off.


Thank you,

Chris Nethery



"Gabriel Genellina" <gagsl-py2 at yahoo.com.ar> wrote in message 
news:mailman.6440.1176427176.32031.python-list at python.org...
> En Thu, 12 Apr 2007 14:05:15 -0300, Chris Nethery <gilcneth at earthlink.net> 
> escribió:
>
>> At present, users of the separate application can run recalculation
>> functions that modify all 700 files at once, causing my code to take the
>> whole ball of wax, rather than just the data that has changed.
>
> Are they text files, or what?
> What kind of modifications? some lines changed/deleted/added? a column 
> recalculated along the whole file?
>
>> What I would like to do is spawn separate processes and load only the 
>> delta
>> data.  The data must be 100% reliable, so I'm leary of using something 
>> like
>> difflib.  I also want to make sure that my code scales since the number 
>> of
>> files is ever-increasing.
>
> Why don't you like difflib? AFAIK it has no known bugs.
>
> -- 
> Gabriel Genellina
> 





More information about the Python-list mailing list