[OT] large db question about no joins

Martin P. Hellwig martin.hellwig at dcuktec.org
Thu Apr 16 19:10:34 EDT 2009


Daniel Fetchinson wrote:
> [off but interesting topic]
<cut>

> What would be the corresponding database layout that would scale and I
> could get the total number of legs in the zoo or total number of
> animals in the zoo without join(s)?
> 
> Cheers,
> Daniel
> 
> [/off but interesting topic]
> 

That all comes down to the keywords, efficiency, robustness and 
performance and you can only pick one of them. So which two can you 
sacrifice? The good news is that it is only theoretical, if you have 
your requirements (i.e. the query with the right results has to return 
within an acceptable time for the user not to get frustrated) who cares 
if it is not as fast as it theoretically could be? Especially if this 
means you can sacrifice the theoretically performance for easier 
deployment and maintenance.

To get back to the layout question, if I need to have a count of the 
legs at any given time, I would like to see the business process that 
requires this operation. Using the business process I could probably 
narrow down the scope quite a bit like, it is not necessary to have a 
precise count at time *now* but it is good enough to have an exact count 
which is no more then 4 seconds old but the query does need to return 
within 50 ms.

One solution could be to build a central data warehouse where all info 
is stored in. Then have satellite db's that does ETL syncing with the 
central one. The layout of the satellites contain an optimised table 
version for the queries you want to throw at it.

If you need to scale it, that is a question of adding another satellite.
Efficiency is right out of the window because you store in essence just 
multiple copies of the same data just in another order. However it is 
robust (by having multiple copies) and has a predictable performance figure.

hth
-- 
mph



More information about the Python-list mailing list