three column dataset - additions and deletions

Tim Harig usernet at ilthio.net
Thu Dec 2 20:42:33 EST 2010


On 2010-12-03, draeath <draeath.spamtrap at gmail.com> wrote:
> On Thu, 02 Dec 2010 22:55:53 +0000, Tim Harig wrote:
>
> Thanks for taking the time to check in on this, Tim!
>
>> So, basically, you want to store a local copy of the data and sync it to
>> the original.
> In a way. I only need to store one copy of the data, and make note of 
> changes between it and the current data.

Perhaps I am missing a sublty that makes those statements different.

>> You need to differentiate between the in memory data model and the
>> storage model.  Since this data comes from a database in the first
>> place, I would dump it to an sqlite3 database from the beginning.  You
>> can use this to store, modify, and change the values as you receive them
>> from the database.
> I thought of doing that, but given that I only need to store a single 
> instance of the data, a simple pickle will do the job nicely (am I 
> correct in reading that it can save/load any python object?)

So, you start by dumping the data from the remote server into an sqlite3
database table.  What you end up with is a record=record copy of the
original query (plus any other meta data that you want to add).  Then,
when the data changes, you apply those same changes to your local table
(or just regenerate it since you seem to be pulling all of the information
from the server anyway.  The result is a *single instance of the data*.

Why you would want to replace this with a pickle of a nested set up tuples
or a homebrew on disk data structure is beyond me.  Using sqlite3 is almost
certainly faster and more functional then anything you are going to create
without some serious work.

>> If you are looking for in-memory structures, then you haven't really
>> provided us with enough information on the significance and organization
>> of the data.
> The data columns:
> Long Int, String (under 30 chars), String (over 100 chars)
> The rows can scale up to hundreds, perhaps thousands.

Then those are the columns that you create for your local table.

> The integer is the database key, the shorter string is a user name, and 
> the longer string is an access control definition. The whole idea of this 
> script is to check, daily, for any added or removed users - or any 
> altered access control definition.

The question is how are you going to use this information once you have
mirroed it locally.  Most likely, from you description, you just need to
access it as a local read only data store.  Now compare the differences in
how you would acces the data:

pickle method:
1. You have to load the entire pickle into memory.
2. Unless you add some kind of tree or indexing mechanism, you will have to
	walk through an average of 1/2 of the records to find the matching
	id.  If you do use an advanced mechanism you have to create the
	code that inserts and locates the data.

sqlite3 method:
1. You open the file using the sqlite3 connector which does not have to
	read all of the data into memory.
2. You use a select query to get just the record for the id that you are
	looking for.  sqlite3 has already provided you with optimized
	lookup and indexing capability, as well as modification
	operations, etc (most likely written in C).

As an added bonus, you don't have to worry about locking issues, to keep
the clients from accesses the datastore and receiving an inconsistant
copy, while you are making your periodic updates to the database

Summary: the pickle method is reinventing the wheel.  You can do what has
	already been done for you with the sqlite3 module (and library)
	that has already been written for you; but, getting anything
	near the same functionality is going to require considerable effort
	on your part; and you are probably going to have to write C to get
	the equivilant performance.

Which seems like a better option to you?

> I realize this could likely all be done from inside the database itself - 
> but altering the DB itself is not an option (as the product vendor is 
> very touchy about that, and altering it can null our support agreement)

Altering the remote database is not an option; but, I am talking about
modifying only your local copy.  If you can rewrite your pickle file,
then you can modify the sqlite3 file.



More information about the Python-list mailing list