how not to run out of memory in cursor.execute

Jack Diederich jack at performancedrivers.com
Mon Jun 5 15:45:41 EDT 2006


On Mon, Jun 05, 2006 at 07:34:05PM +0100, Steve Holden wrote:
> amberite wrote:
> > johnlichtenstein at gmail.com wrote:
> > 
> >>I am using cx_Oracle and MySQLdb to pull a lot of data from some tables
> >>and I find that the cursor.execute method uses a lot of memory that
> >>never gets garbage collected. Using fetchmany instead of fetchall does
> >>not seem to make any difference, since it's the execute that uses
> >>memory. Breaking the query down to build lots of small tables doesn't
> >>help, since execute doesn't give its memory back, after reading enough
> >>small tables execute returns a memory error. What is the trick to get
> >>memory back from execute in cx_Oracle and MySQLdb?
> > 
> > 
> > cx_Oracle and MySQLdb must be handled differently, due to the fact that
> > MySQL does not actually have cursors (MySQLdb fakes them for you).
> > 
> > To handle large resultsets efficiently in cx_Oracle simply use the
> > cursor iteration idiom:
> > 
> > for row in cursor:
> >     # do stuff with the row
> > 
> > cx_Oracle takes care of the fetching for you, and your memory usage
> > should remain fairly constant when using this idiom.
> > 
> > To handle large resultsets in MySQLdb, you have to resort to multiple
> > queries:
> > 
> > l = 1000
> > o = 0
> > 
> > cursor.execute('SELECT foo FROM Bar LIMIT %d OFFSET %d', (l, o))
> > rows = cursor.fetchall()
> > while len(rows) > 0:
> >     # process the fetched rows
> >     o += l
> >     cursor.execute('SELECT foo FROM Bar LIMIT %d OFFSET %d', (l, o))
> >     rows = cursor.fetchall()
> > 
> > cursor.close()
> > 
> > As you can see, the MySQLdb version is more involved, due to the lack
> > of real cursor support in the MySQL database.  Any database with good
> > cursor support will likely have good cursor iteration support in the
> > corresponding DBAPI driver.
> > 
> > Hope this helps,
> > 
> > L. Daniel Burr
> > 
> The MySQLdb solution you give is way more complicated than it needs to 
> be, thereby skewing your opinion towards cx_Oracle unnecessarily.
> 
> Look up the .fetchmany() method of cursors in the DB API. There is only 
> any need to execute a single query no matter how large the result set: 
> you simply need to keep calling .fetchmany(N) (where N is whatever 
> you've decided by testing is your optimum chunk size) until it returns 
> less than N rows, at which point you have exhausted the query.
> 
> It's very little more effort to wrap this all up as a generator that 
> effectively allows you to use the same solution as you quote for cx_Oracle.

MySQL will keep table locks until the results are all fetched so even though
the DB API allows fetchone() or fetchmany() using those with MySQLdb is
dangerous.

-Jack




More information about the Python-list mailing list