how not to run out of memory in cursor.execute
Steve Holden
steve at holdenweb.com
Mon Jun 5 14:34:05 EDT 2006
amberite wrote:
> johnlichtenstein at gmail.com wrote:
>
>>I am using cx_Oracle and MySQLdb to pull a lot of data from some tables
>>and I find that the cursor.execute method uses a lot of memory that
>>never gets garbage collected. Using fetchmany instead of fetchall does
>>not seem to make any difference, since it's the execute that uses
>>memory. Breaking the query down to build lots of small tables doesn't
>>help, since execute doesn't give its memory back, after reading enough
>>small tables execute returns a memory error. What is the trick to get
>>memory back from execute in cx_Oracle and MySQLdb?
>
>
> cx_Oracle and MySQLdb must be handled differently, due to the fact that
> MySQL does not actually have cursors (MySQLdb fakes them for you).
>
> To handle large resultsets efficiently in cx_Oracle simply use the
> cursor iteration idiom:
>
> for row in cursor:
> # do stuff with the row
>
> cx_Oracle takes care of the fetching for you, and your memory usage
> should remain fairly constant when using this idiom.
>
> To handle large resultsets in MySQLdb, you have to resort to multiple
> queries:
>
> l = 1000
> o = 0
>
> cursor.execute('SELECT foo FROM Bar LIMIT %d OFFSET %d', (l, o))
> rows = cursor.fetchall()
> while len(rows) > 0:
> # process the fetched rows
> o += l
> cursor.execute('SELECT foo FROM Bar LIMIT %d OFFSET %d', (l, o))
> rows = cursor.fetchall()
>
> cursor.close()
>
> As you can see, the MySQLdb version is more involved, due to the lack
> of real cursor support in the MySQL database. Any database with good
> cursor support will likely have good cursor iteration support in the
> corresponding DBAPI driver.
>
> Hope this helps,
>
> L. Daniel Burr
>
The MySQLdb solution you give is way more complicated than it needs to
be, thereby skewing your opinion towards cx_Oracle unnecessarily.
Look up the .fetchmany() method of cursors in the DB API. There is only
any need to execute a single query no matter how large the result set:
you simply need to keep calling .fetchmany(N) (where N is whatever
you've decided by testing is your optimum chunk size) until it returns
less than N rows, at which point you have exhausted the query.
It's very little more effort to wrap this all up as a generator that
effectively allows you to use the same solution as you quote for cx_Oracle.
regards
Steve
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC/Ltd http://www.holdenweb.com
Love me, love my blog http://holdenweb.blogspot.com
Recent Ramblings http://del.icio.us/steve.holden
More information about the Python-list
mailing list