custom data warehouse in python vs. out-of-the-box ETL tool

Martin P. Hellwig martin.hellwig at dcuktec.org
Wed Sep 23 15:28:45 EDT 2009


snfctech wrote:
> @Martin:  I originally thought that there was nothing "magical" about
> building a data warehouse, but then I did a little research and
> received all sorts of feedback about how data warehouse projects have
> notorious failure rates, that data warehouse design IS different than
> normal RDBMS - and then there's the whole thing about data marts vs.
> warehouses, Kimball vs. Inmon, star schemas, EAV tables, and so on.
> So I started to think that maybe I needed to get a little better read
> on the subject.

Yes failure rate for data warehouse projects is quite high, so are other 
IT projects without data warehouses.

Data warehouse design is not that much different than 'normal' RDBMS,
you are following the same decisions for example:
- Do I rather have multiple copies of data than slow or complicated 
access? If yes how do I ensure integrity?
- How do I do access control on value level?
- Do I need fail over or load balancing, if yes how much of it can I do 
on the application level?

The thing is if you never have designed a database from a database point 
of view because you used abstractions that hide these ugly details then 
yes, Data warehouses are different than normal RDBMS.

Yes you can make it all sound very complicated by throwing PHB words in 
like meta schema's, dimensional approach, subject orientated, etc.
I like to see it more like I have a couple of data sources, I want to 
combine them  in a way that I can do neat stuff with it. The 'combine 
them' part is the data warehouse design, the 'neat stuff' part are the 
clients that use that data warehouse. If you wish you can start making 
it more complicated by saying but my data sources are also my clients, 
but that is another step.

I guess you can sum up all this by saying a data warehouse is a method 
of gathering already existing data and present them in a different way, 
the concept is simple, the details can be complicated if you want/need 
it to be.

-- 
MPH
http://blog.dcuktec.com
'If consumed, best digested with added seasoning to own preference.'



More information about the Python-list mailing list