custom data warehouse in python vs. out-of-the-box ETL tool

Tony Schmidt tschmidt at sacfoodcoop.com
Wed Sep 23 16:20:41 EDT 2009


@Martin:  Thanks for your great feedback.

So do you think it would be very beneficial for me to start with an
Inman or Kimball book?  Or do you think it would be just leisure
reading and not very practical at best - fill my head with needless
jargon and inflexible dogmas, at worst?

I took a database class in college, understand the basic principals of
normalisation, and have built a few complicated RDBMS schemas from the
ground up.

On Sep 23, 12:28 pm, "Martin P. Hellwig" <martin.hell... at dcuktec.org>
wrote:
> snfctech wrote:
> > @Martin:  I originally thought that there was nothing "magical" about
> > building a datawarehouse, but then I did a little research and
> > received all sorts of feedback about how datawarehouseprojects have
> > notorious failure rates, that datawarehousedesign IS different than
> > normal RDBMS - and then there's the whole thing about data marts vs.
> > warehouses, Kimball vs. Inmon, star schemas, EAV tables, and so on.
> > So I started to think that maybe I needed to get a little better read
> > on the subject.
>
> Yes failure rate for datawarehouseprojects is quite high, so are other
> IT projects without data warehouses.
>
> Datawarehousedesign is not that much different than 'normal' RDBMS,
> you are following the same decisions for example:
> - Do I rather have multiple copies of data than slow or complicated
> access? If yes how do I ensure integrity?
> - How do I do access control on value level?
> - Do I need fail over or load balancing, if yes how much of it can I do
> on the application level?
>
> The thing is if you never have designed a database from a database point
> of view because you used abstractions that hide these ugly details then
> yes, Data warehouses are different than normal RDBMS.
>
> Yes you can make it all sound very complicated by throwing PHB words in
> like meta schema's, dimensional approach, subject orientated, etc.
> I like to see it more like I have a couple of data sources, I want to
> combine them  in a way that I can do neat stuff with it. The 'combine
> them' part is the datawarehousedesign, the 'neat stuff' part are the
> clients that use that datawarehouse. If you wish you can start making
> it more complicated by saying but my data sources are also my clients,
> but that is another step.
>
> I guess you can sum up all this by saying a datawarehouseis a method
> of gathering already existing data and present them in a different way,
> the concept is simple, the details can be complicated if you want/need
> it to be.
>
> --
> MPHhttp://blog.dcuktec.com
> 'If consumed, best digested with added seasoning to own preference.'




More information about the Python-list mailing list