how not to run out of memory in cursor.execute
Jack Diederich
jack at performancedrivers.com
Mon Jun 5 15:45:41 EDT 2006
On Mon, Jun 05, 2006 at 07:34:05PM +0100, Steve Holden wrote:
> amberite wrote:
> > johnlichtenstein at gmail.com wrote:
> >
> >>I am using cx_Oracle and MySQLdb to pull a lot of data from some tables
> >>and I find that the cursor.execute method uses a lot of memory that
> >>never gets garbage collected. Using fetchmany instead of fetchall does
> >>not seem to make any difference, since it's the execute that uses
> >>memory. Breaking the query down to build lots of small tables doesn't
> >>help, since execute doesn't give its memory back, after reading enough
> >>small tables execute returns a memory error. What is the trick to get
> >>memory back from execute in cx_Oracle and MySQLdb?
> >
> >
> > cx_Oracle and MySQLdb must be handled differently, due to the fact that
> > MySQL does not actually have cursors (MySQLdb fakes them for you).
> >
> > To handle large resultsets efficiently in cx_Oracle simply use the
> > cursor iteration idiom:
> >
> > for row in cursor:
> > # do stuff with the row
> >
> > cx_Oracle takes care of the fetching for you, and your memory usage
> > should remain fairly constant when using this idiom.
> >
> > To handle large resultsets in MySQLdb, you have to resort to multiple
> > queries:
> >
> > l = 1000
> > o = 0
> >
> > cursor.execute('SELECT foo FROM Bar LIMIT %d OFFSET %d', (l, o))
> > rows = cursor.fetchall()
> > while len(rows) > 0:
> > # process the fetched rows
> > o += l
> > cursor.execute('SELECT foo FROM Bar LIMIT %d OFFSET %d', (l, o))
> > rows = cursor.fetchall()
> >
> > cursor.close()
> >
> > As you can see, the MySQLdb version is more involved, due to the lack
> > of real cursor support in the MySQL database. Any database with good
> > cursor support will likely have good cursor iteration support in the
> > corresponding DBAPI driver.
> >
> > Hope this helps,
> >
> > L. Daniel Burr
> >
> The MySQLdb solution you give is way more complicated than it needs to
> be, thereby skewing your opinion towards cx_Oracle unnecessarily.
>
> Look up the .fetchmany() method of cursors in the DB API. There is only
> any need to execute a single query no matter how large the result set:
> you simply need to keep calling .fetchmany(N) (where N is whatever
> you've decided by testing is your optimum chunk size) until it returns
> less than N rows, at which point you have exhausted the query.
>
> It's very little more effort to wrap this all up as a generator that
> effectively allows you to use the same solution as you quote for cx_Oracle.
MySQL will keep table locks until the results are all fetched so even though
the DB API allows fetchone() or fetchmany() using those with MySQLdb is
dangerous.
-Jack
More information about the Python-list
mailing list