Using python to delta-load files into a central DB
Chris Nethery
gilcneth at earthlink.net
Thu Apr 12 22:51:22 EDT 2007
Gabriel,
Thank you for your reply.
Yes, they are tab-delimited text files that will change very little
throughout the day.
But, this is messy, antiquated 80s junk, nonetheless.
Rows are designated either by a row type or they contain a comment. Each
row type has an identity value, but the 'comment' rows do not. The comment
rows, however, are logically associated with the last occurring row type.
When I generate my bulk insert file, I add the identity of the last
occurring row type to the comment rows, and generate and populate an
additional identity column in order to retain the order of the comments.
Generally rows will either be added or changed, but sometimes rows will be
removed. Typically, only 1-5 new rows will be added to a file in a given
day, but users sometimes make manual corrections/deletions to older rows and
sometimes certain column values are recalculated.
Did I mention that the header contains another implied hierarchy?
Fortunately, I can just ignore it and strip it off.
Thank you,
Chris Nethery
"Gabriel Genellina" <gagsl-py2 at yahoo.com.ar> wrote in message
news:mailman.6440.1176427176.32031.python-list at python.org...
> En Thu, 12 Apr 2007 14:05:15 -0300, Chris Nethery <gilcneth at earthlink.net>
> escribió:
>
>> At present, users of the separate application can run recalculation
>> functions that modify all 700 files at once, causing my code to take the
>> whole ball of wax, rather than just the data that has changed.
>
> Are they text files, or what?
> What kind of modifications? some lines changed/deleted/added? a column
> recalculated along the whole file?
>
>> What I would like to do is spawn separate processes and load only the
>> delta
>> data. The data must be 100% reliable, so I'm leary of using something
>> like
>> difflib. I also want to make sure that my code scales since the number
>> of
>> files is ever-increasing.
>
> Why don't you like difflib? AFAIK it has no known bugs.
>
> --
> Gabriel Genellina
>
More information about the Python-list
mailing list