python ETL

Paul Watson pwatson at redlinepy.com
Mon Aug 1 11:49:36 EDT 2005


arielgr at gmail.com wrote:
> Hi,
> My company is involved in the development of many data marts and
> data-warehouses, and I currently looking into migrating our old set of
> tools (written in Korn) to a new, more dynamic and robust one. I am
> looking into python as I have heard that it could be a good contestant
> for the job, and wanted to know if anyone knew of an existing open
> source project which implements ETL using python, or any libraries that
> may ease the production of such tools.
> 
> Thanks.

Robert is right; you have not really given much information.

However, I would have to assume that if homebrew shell scripts have been 
doing the work adequately, then the marts and warehouses are not very 
large and the datasets are primarily text rather than binary.

If this is the case and you are only seeking incremental improvement, 
then Python would be a very good choice.  Perl would also do the job. 
Just about any language would work.  Yes, there are many reasons to 
choose Python.  However, you would have to build any scalability and 
metadata management.

If you seek a radical improvement, it is available, but I do not know of 
any free tools that will do it.  A question like this will probably not 
be answered in a newsgroup post or even the exchange of a few emails.

Choosing an effective tool for the organization is not a trivial 
process.  It requires knowledge of both the tools and the organization's 
methodologies and processes.  If you do not have staff who can do this, 
most companies find it is much cheaper and faster to pay someone who 
does know (a consultant) to assist them in assessing their requirements, 
tool selection, and forming an implementation plan.

Yes, your company staff can learn a lot by experimenting and playing 
with several tools, but shareholders might not view that approach as the 
most effective.



More information about the Python-list mailing list