python ETL

Jorgen Grahn jgrahn-nntq at algonet.se
Thu Aug 4 13:55:04 EDT 2005


On Mon, 01 Aug 2005 10:49:36 -0500, Paul Watson <pwatson at redlinepy.com> wrote:
> arielgr at gmail.com wrote:
>> Hi,
>> My company is involved in the development of many data marts and
>> data-warehouses, and I currently looking into migrating our old set of
>> tools (written in Korn) to a new, more dynamic and robust one.
...
> However, I would have to assume that if homebrew shell scripts have been 
> doing the work adequately, then the marts and warehouses are not very 
> large and the datasets are primarily text rather than binary.
>
> If this is the case and you are only seeking incremental improvement, 
> then Python would be a very good choice.  Perl would also do the job. 
> Just about any language would work.  Yes, there are many reasons to 
> choose Python.  However, you would have to build any scalability and 
> metadata management.
>
> If you seek a radical improvement, it is available, but I do not know of 
> any free tools that will do it.  A question like this will probably not 
> be answered in a newsgroup post or even the exchange of a few emails.
>
> Choosing an effective tool for the organization is not a trivial 
> process.  It requires knowledge of both the tools and the organization's 
> methodologies and processes.  If you do not have staff who can do this, 
> most companies find it is much cheaper and faster to pay someone who 
> does know (a consultant) to assist them in assessing their requirements, 
> tool selection, and forming an implementation plan.

But remember: sometimes, a bunch of shell scripts or a Python script is the
right tool for the problem.

Sometimes, I think a bunch of shell scripts is the right tool for a lot of
the problems people throw XMLthis, XMLthat, .NET, SQL servers, consultants
and money at.

There is no real reason (with the little information we have[1]) to believe
that the original poster is making his employer a disservice by looking at
doing things himself, in plain old Python, instread of letting someome tear
down and rebuild whatever workflow/methodology/process stuff they have right
now.

/Jorgen
[1] Unless "ETL" and "data mart" carry some deep meaning which
    I've missed, that is.

-- 
  // Jorgen Grahn <jgrahn@       Ph'nglui mglw'nafh Cthulhu
\X/                algonet.se>   R'lyeh wgah'nagl fhtagn!



More information about the Python-list mailing list