[Tutor] Optimal solution in dealing with huge databases in python

Thu Jan 25 04:22:46 CET 2007

Hi,
  I am working in a biodatabases project.The data I need to deal with is  in 100s of GB.I am using postgresql backend and SQLALCHEMY ORM.I need  to read the bio datafiles and parse them and then store them in  database.I am in the process of storing them.
  I used the session,flush concept in SQLALCHEMY.Initially I used to  flush every query immediately.Later I realised that the queries are  independent of each other and so started flushing 3-5 Lakh insert  queries at a time.This increased the response time.But the memory is  overflowing.Then I released not-in-use memory using del command in  python still there is no use as this del statement can only free part  of memory.I need to increase the above 3-5 lakh number to a much large  one to get a real time response.Other wise my estimation is it will  take 1 year to just insert the data into the database.From postgresql  side also I turned off WAL.
  Please suggest some viable solution to handle such enormous data from  python.Is there a better solution than SQL alchemy?Any solution that  speeds up my program is highly appreciated.
  Thanks and Regards,
  Shadab.

 Send instant messages to your online friends http://uk.messenger.yahoo.com 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/tutor/attachments/20070125/9112087e/attachment.html