[BangPypers] Unstructured data and python

Pradeep Gowda pradeep at btbytes.com
Fri Oct 16 20:47:48 CEST 2009


On Fri, Oct 16, 2009 at 2:31 PM, Carl Trachte <ctrachte at gmail.com> wrote:
> On 10/16/09, Ramdas S <ramdaz at gmail.com> wrote:
>> Has anyone worked/seen any project which involves migrating unstructured
>> data, mostly text files to a reasonably indexed databas preferably written
>> in Python or has Python APIs.
>> I am even ok if its commercial project.
>>
>
> FWIW, when I worked in a Microsoft SQL environment, I used DTS for SQL
> 7 or 2000 with the win32com modules and SSIS for with IronPython for
> later versions.
>
> It was usually a standard process of glueing together a bunch of data
> in a csv file with Python, then automating the DTS or SSIS program to
> dump the data to a database table or series of tables.
>
> You could probably do something similar with MySQL or Postgres.  The
> hard part was always writing the Python to do the situation-specific
> initial crunch of the data.

I believe what you are looking for is a an ETL (extraction,
tranformation and loading) application.
It can be as simple as couple of python scripts, especially if it is a
one-off job.
You can use web.py's sql module or sqlalchemy(more work..) to generate
sql statements, if you don't
like writing sql statements yourself.


If the data loading/cleaning/transformation has to be on a regular
basis, you may want to investigate
something like http://www.pentaho.com/products/data_integration/. I
have had fairly decent
success with using Pentaho Chef suite (link above) in doing ETL for
telco OLTP data with postgresql as the destination DB.

+PG


More information about the BangPypers mailing list