[Tutor] Optimal solution in dealing with huge databases in python
Shadab Sayani
shadabsayani at yahoo.com
Thu Jan 25 04:22:46 CET 2007
Hi,
I am working in a biodatabases project.The data I need to deal with is in 100s of GB.I am using postgresql backend and SQLALCHEMY ORM.I need to read the bio datafiles and parse them and then store them in database.I am in the process of storing them.
I used the session,flush concept in SQLALCHEMY.Initially I used to flush every query immediately.Later I realised that the queries are independent of each other and so started flushing 3-5 Lakh insert queries at a time.This increased the response time.But the memory is overflowing.Then I released not-in-use memory using del command in python still there is no use as this del statement can only free part of memory.I need to increase the above 3-5 lakh number to a much large one to get a real time response.Other wise my estimation is it will take 1 year to just insert the data into the database.From postgresql side also I turned off WAL.
Please suggest some viable solution to handle such enormous data from python.Is there a better solution than SQL alchemy?Any solution that speeds up my program is highly appreciated.
Thanks and Regards,
Shadab.
Send instant messages to your online friends http://uk.messenger.yahoo.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/tutor/attachments/20070125/9112087e/attachment.html
More information about the Tutor
mailing list