How would you design scalable solution?

Jonathan Gardner jgardner at jonathangardner.net
Tue Oct 27 20:12:08 EDT 2009


On Oct 27, 10:10 am, Bryan <bryanv... at gmail.com> wrote:
>
> How else to keep a record of every transaction, but not have the speed
> of the
> question "How many Things in Bucket x" depend on looking @ every
> transaction
> record ever made?

You can have three different tables in your database:

(1) The transaction log. (You described it above.)

(2) What the current state of the entire system is---where everything
is.

(3) A materialized view of the total of what is in each bucket.

Note that you didn't specify that the system should answer the
question "What is the history of this item?" or "What is the
historical contents of this bucket?" Because of this, you really don't
need (1). Gnucash has this requirement since people would like to go
back in time and see what happened historically.

Your requirements also don't specify that the system should answer the
question, "What is in this bucket right now?" Because of that, you
really don't need to expose the data in (2). However, I assume you'll
need some way of saying, "Can't move X from A to B because X isn't in
A!" so (2) is necessary.

(3) gives you exactly what you need, with linear look up by bucket ID
if you use a hash index for any number of buckets. You can also obtain
the value of all buckets linearly.

Do some research into what a "materialized view" is and how to keep it
in sync with the actual state of the data so that you can do this
properly. I believe that should solve your problem.

If you intend to implement this in memory by yourself, please
familiarize yourself on the various issues you will run into with
transactional and persistent data, especially with multi-threading if
you will allow it.

If you use a proper database to manage transactions for you, there is
no reason that the transaction log, current contents, and current
summary materialized view should ever get out of sync with each other,
as long as your transactions are all correct. For best results, write
a database function that does what you need and have all the clients
update the tables through it.



More information about the Python-list mailing list