psycopg2 and large queries

Thu Dec 18 12:34:39 EST 2008

On 18 Des, 16:34, Laszlo Nagy <gand... at shopzeus.com> wrote:
> psycopg2 is said to be db api 2.0 compilant, but apparent it is buggy.
> By default, when I create a cursor with
>
> cur = conn.cursor()
>
> then it creates a cursor that will fetch all rows into memory, even if
> you call cur.fetchone() on it. (I tested it, see below.)

Yes, I filed a bug against psycopg2 on this very subject, although the
project doesn't seem to have a bug-tracker any more.

> I was looking for psycopg2 documentation, but I found nothing. However,
> I found some posts telling that named cursors do support fetching a
> single row at a time. Here is how to create a named cursor:
>
> cur = conn.cursor('mycursor')
>
> This is very strange, because DB API 2.0 does not have this feature. Why
> this feature was created, and how to use it? Not documented.

The reason is that PostgreSQL supports server-side cursors through a
DECLARE ... CURSOR statement, but that statement can only be used with
certain SQL statements. Other DB-API modules employ simple but
imperfect tricks to guess whether the statement being issued is
compatible with DECLARE ... CURSOR before automatically creating a
cursor, but the psycopg2 maintainers refused to introduce such a
feature as part of the default behaviour. I didn't pursue the avenue
of making a suitable set of patches to satisfy both them and myself,
and I subsequently went back to using pyPgSQL instead.

[...]

> It is clear that named cursors have very bad performance, I cannot use
> them. Nameless cursors cannot be used either, because they are stressing
> the system, put 100% disk I/O and big memory usage, without any good reason.
>
> The only one solution I found is to use named cursors, and use
> fetchmany(100) instead of fetchone(). This results in fast opening
> (0.005 sec) of the cursor, and good speed (30 000 rec/sec, about 75% of
> the nameless version). (Do I really need to implement buffering in
> Python to be efficient???)

You really don't want to be traversing large data sets using fetchone,
anyway. My approach (using pyPgSQL) involves fetchmany and then
looping over each batch of results, if I really have to process the
data in Python; most of the time I can do the processing in the
database itself.

> However, this is also not usable, because named cursors do not have a
> ".description" property! You can try this:
>
> cur = conn.cursor('mycursor')
> cur.execute('select name from product limit 1000000')
> print repr(cur.description)  # -> None
>
> This is unacceptable! This is not DB API 2.0 compilant. I have to know
> the names of the columns, how can I do that?
>
> What am I doing wrong? Please help me!

I'm not really a user of the description property, so I can't advise
you there. Strictly, psycopg2 is DB-API compliant, but the interaction
between result sets and cursors really doesn't live up to what the
specification suggests is possible.

Paul