Handling transactions in Python DBI module

Thu Feb 11 12:00:21 EST 2016

On Feb 10, 2016, at 8:14 PM, Chris Angelico <rosuav at gmail.com> wrote:
> 
> On Thu, Feb 11, 2016 at 4:06 PM, Frank Millman <frank at chagford.com> wrote:
>> A connection has 2 possible states - 'in transaction', or 'not in
>> transaction'. When you create the connection it starts off as 'not'.
>> 
>> When you call cur.execute(), it checks to see what state it is in. If the
>> state is 'not', it silently issues a 'BEGIN TRANSACTION' before executing
>> your statement. This applies for SELECT as well as other statements.
>> 
>> All subsequent statements form part of the transaction, until you issue
>> either conn.commit() or conn.rollback(). This performs the required action,
>> and resets the state to 'not'.
>> 
>> I learned the hard way that it is important to use conn.commit() and not
>> cur.execute('commit'). Both succeed in committing, but the second does not
>> reset the state, therefore the next statement does not trigger a 'BEGIN',
>> with possible unfortunate side-effects.
> 
> When I advise my students on basic databasing concepts, I recommend
> this structure:
> 
> conn = psycopg2.connect(...)
> 
> with conn, conn.cursor() as cur:
>    cur.execute(...)

And that is the structure I tend to use in my programs as well. I could, of course, roll the transaction control into that structure. However, that is a usage choice of the end user, whereas I am looking at the design of the connection/cursor itself. If I use psycopg, I get the transaction - even if I don't use a with block.

> 
> The transaction block should always start at the 'with' block and end
> when it exits. As long as you never nest them (including calling other
> database-using functions from inside that block), it's easy to reason
> about the database units of work - they always correspond perfectly to
> the code blocks.
> 
> Personally, I'd much rather the structure were "with
> conn.transaction() as cur:", because I've never been able to
> adequately explain what a cursor is/does. It's also a bit weird that
> "with conn:" doesn't close the connection at the end (just closes the
> transaction within that connection). But I guess we don't need a
> "Python DB API 3.0".

In my mind, cursors are simply query objects containing (potentially) result sets - so you could have two cursors, and loop through them something like "for result_1,result_2 in zip(cursor_1,cursor_2): ". Personally, I've never had a need for more than one cursor, but if you are working with large data sets, and need to work with multiple queries simultaneously without the overhead of loading the results into memory, I could see them being useful.

Of course, someone else might have a completely different explanation :-)
> 
> ChrisA
> -- 
> https://mail.python.org/mailman/listinfo/python-list