[Tutor] Optimal solution in dealing with huge databases in python

Shadab Sayani shadabsayani at yahoo.com
Fri Jan 26 03:15:41 CET 2007


Hi,
  
I got your point.But before inserting data I need to store it into  a file in a format supported by postgresql.Wont this operation incur a  performance hit as it includes writing to a file which is on disk?
  
  Thanks and Regards,
  Shadab.
  
johnf <jfabiani at yolo.com> wrote:  On Wednesday 24 January 2007 19:22, Shadab Sayani wrote:
> Hi,
>   I am working in a biodatabases project.The data I need to deal with is 
> in 100s of GB.I am using postgresql backend and SQLALCHEMY ORM.I need  to
> read the bio datafiles and parse them and then store them in  database.I am
> in the process of storing them. I used the session,flush concept in
> SQLALCHEMY.Initially I used to  flush every query immediately.Later I
> realised that the queries are  independent of each other and so started
> flushing 3-5 Lakh insert  queries at a time.This increased the response
> time.But the memory is  overflowing.Then I released not-in-use memory using
> del command in  python still there is no use as this del statement can only
> free part  of memory.I need to increase the above 3-5 lakh number to a much
> large  one to get a real time response.Other wise my estimation is it will 
> take 1 year to just insert the data into the database.From postgresql  side
> also I turned off WAL. Please suggest some viable solution to handle such
> enormous data from  python.Is there a better solution than SQL alchemy?Any
> solution that  speeds up my program is highly appreciated. Thanks and
> Regards,
>   Shadab.
>
>  Send instant messages to your online friends http://uk.messenger.yahoo.com

To tell the truth I know nothing about SQLALCHEMY ORM.  But I do know how to 
load large amounts of data into Postgres.  You use the 'copy ... from' 
command.  I once had 2 gigs of data and was surprised how fast it added the 
data.  Two things to remember:
1.  no indexes
2.  you have to format the data correctly - see Postgres DOC's

BTW I use psycopg for my connection
-- 
John Fabiani
_______________________________________________
Tutor maillist  -  Tutor at python.org
http://mail.python.org/mailman/listinfo/tutor


 Send instant messages to your online friends http://uk.messenger.yahoo.com 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/tutor/attachments/20070126/1c167d54/attachment-0001.htm 


More information about the Tutor mailing list