how not to run out of memory in cursor.execute

Steve Holden steve at holdenweb.com
Mon Jun 5 14:34:05 EDT 2006


amberite wrote:
> johnlichtenstein at gmail.com wrote:
> 
>>I am using cx_Oracle and MySQLdb to pull a lot of data from some tables
>>and I find that the cursor.execute method uses a lot of memory that
>>never gets garbage collected. Using fetchmany instead of fetchall does
>>not seem to make any difference, since it's the execute that uses
>>memory. Breaking the query down to build lots of small tables doesn't
>>help, since execute doesn't give its memory back, after reading enough
>>small tables execute returns a memory error. What is the trick to get
>>memory back from execute in cx_Oracle and MySQLdb?
> 
> 
> cx_Oracle and MySQLdb must be handled differently, due to the fact that
> MySQL does not actually have cursors (MySQLdb fakes them for you).
> 
> To handle large resultsets efficiently in cx_Oracle simply use the
> cursor iteration idiom:
> 
> for row in cursor:
>     # do stuff with the row
> 
> cx_Oracle takes care of the fetching for you, and your memory usage
> should remain fairly constant when using this idiom.
> 
> To handle large resultsets in MySQLdb, you have to resort to multiple
> queries:
> 
> l = 1000
> o = 0
> 
> cursor.execute('SELECT foo FROM Bar LIMIT %d OFFSET %d', (l, o))
> rows = cursor.fetchall()
> while len(rows) > 0:
>     # process the fetched rows
>     o += l
>     cursor.execute('SELECT foo FROM Bar LIMIT %d OFFSET %d', (l, o))
>     rows = cursor.fetchall()
> 
> cursor.close()
> 
> As you can see, the MySQLdb version is more involved, due to the lack
> of real cursor support in the MySQL database.  Any database with good
> cursor support will likely have good cursor iteration support in the
> corresponding DBAPI driver.
> 
> Hope this helps,
> 
> L. Daniel Burr
> 
The MySQLdb solution you give is way more complicated than it needs to 
be, thereby skewing your opinion towards cx_Oracle unnecessarily.

Look up the .fetchmany() method of cursors in the DB API. There is only 
any need to execute a single query no matter how large the result set: 
you simply need to keep calling .fetchmany(N) (where N is whatever 
you've decided by testing is your optimum chunk size) until it returns 
less than N rows, at which point you have exhausted the query.

It's very little more effort to wrap this all up as a generator that 
effectively allows you to use the same solution as you quote for cx_Oracle.

regards
  Steve

-- 
Steve Holden       +44 150 684 7255  +1 800 494 3119
Holden Web LLC/Ltd          http://www.holdenweb.com
Love me, love my blog  http://holdenweb.blogspot.com
Recent Ramblings     http://del.icio.us/steve.holden




More information about the Python-list mailing list