psycopg2 and large queries

Laszlo Nagy gandalf at shopzeus.com
Thu Dec 18 10:34:38 EST 2008


psycopg2 is said to be db api 2.0 compilant, but apparent it is buggy.
By default, when I create a cursor with

cur = conn.cursor()

then it creates a cursor that will fetch all rows into memory, even if
you call cur.fetchone() on it. (I tested it, see below.)

I was looking for psycopg2 documentation, but I found nothing. However,
I found some posts telling that named cursors do support fetching a
single row at a time. Here is how to create a named cursor:

cur = conn.cursor('mycursor')

This is very strange, because DB API 2.0 does not have this feature. Why
this feature was created, and how to use it? Not documented.

Anyway, I wrote a program that uses named/nameless cursors, sets
cur.arraysize to different values before callig cur.execute, then
fetches all rows with fetchone(). I used the following query on a bigger
table:

select name from product limit 1000000

Here are the test results for fetchone():

Named cursor, arraysize=1    2613 rec/sec, query opens in 0.005 sec,
uses only a few memory
Named cursor, arraysize=1000    2831 rec/sec, query opens in 0.005 sec,
uses only a few memory
Nameless cursor, arraysize=1 41335 rec/sec, query opens in 28 sec, uses
100MB+ memory
Nameless cursor, arraysize=1 39602 rec/sec, query opens in 25.8 sec,
uses 100MB+ memory

It is clear that named cursors have very bad performance, I cannot use
them. Nameless cursors cannot be used either, because they are stressing
the system, put 100% disk I/O and big memory usage, without any good reason.

The only one solution I found is to use named cursors, and use
fetchmany(100) instead of fetchone(). This results in fast opening
(0.005 sec) of the cursor, and good speed (30 000 rec/sec, about 75% of
the nameless version). (Do I really need to implement buffering in
Python to be efficient???)

However, this is also not usable, because named cursors do not have a
".description" property! You can try this:

cur = conn.cursor('mycursor')
cur.execute('select name from product limit 1000000')
print repr(cur.description)  # -> None

This is unacceptable! This is not DB API 2.0 compilant. I have to know
the names of the columns, how can I do that?

What am I doing wrong? Please help me!

Thanks,

   Laszlo

p.s.: I tried to subscribe to the psycopg mailing list, but the
confirmation email did not arrive in 6 hours...








More information about the Python-list mailing list