Database Query Contains Old Data

M.-A. Lemburg mal at egenix.com
Tue Jun 3 06:07:07 EDT 2008


On 2008-06-03 00:17, James A. Donald wrote:
> On Wed, 21 May 2008 07:23:04 -0700 (PDT), Paul Boddie 
>> MySQL appears to use "repeatable read" by default [1] as its
>> transaction isolation level, whereas PostgreSQL (for example) uses
>> "read committed" by default [2]. I would guess that if you were using
>> PostgreSQL, this particular problem would not have occurred, but there
>> are other reasons to be aware of the effects of long duration
>> transactions in PostgreSQL, and the practice of periodically
>> performing a rollback would still be worth considering with that
>> database system.
> 
> If one has transactions open for a long time, or transactions that
> involve a great deal of data, this will result in poor performance or
> poor scalability. 

Poor performance is usually not an issue since databases are
optimized to work with transactions.

What's more important is that an open transaction will cause locks
on the tables you are writing to. Depending on the database
backend these locks may lock the entire table or just a few rows.

In any case, such locks prevent accessing the tables or rows
in question from other connections and that will quickly turn
into a major problem if you have more than just one connection
to the database.

> But one may have such large transactions without
> being aware of it.  Is there any way to make transaction size salient
> to the developer?  Any way to make sure one is committing as early and
> often as possible?

This depends on the database in question. By accessing system tables
directly you can usually find out a lot about the database and your
current transaction.

That said, it's not a good idea to commit a logical transaction (ie.
a user entering data) in multiple chunks. You'd lose the most important
feature of transactions: that of being able to rollback to the start
of the transaction.

As others have mentioned, in systems that have long running logical
transactions, it's usually best to collect the data until the very
end and then apply all changes in one go (and one database
transaction).

Another problem with long running transactions is that the data
in the tables may change after the start of the transaction. This
can result in invalid data being committed (e.g. one part of a calculation
uses the data at time t1 and another at time t2). This can be
avoided by using snapshots, versioning and timestamps in the tables,
so that all queries use the same data.

And if this were not enough, you often run into conflicts during
the commit phase due to changes made by others to the same tables.

These can usually only be avoided by implementing merge strategies
in your application, unless you want to lock out all other users
during the transaction ... which would bring you back to the
original problem.

In summary: long running transactions are not easy to get right :-)

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Jun 03 2008)
 >>> Python/Zope Consulting and Support ...        http://www.egenix.com/
 >>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
 >>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________
2008-07-07: EuroPython 2008, Vilnius, Lithuania            33 days to go

:::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! ::::


    eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
     D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
            Registered at Amtsgericht Duesseldorf: HRB 46611



More information about the Python-list mailing list