[DB-SIG] mxODBC performance question

Sun, 28 May 2000 21:19:50 -0400

I've been converting over some legacy Perl scripts that handle internal
database synchronization to use Python, and have been using mxODBC for the
database interface (the platform is Windows NT).  I'm running into a
significant performance degredation with the move that I was hoping someone
might help shed some light on.

For some reason, the converted scripts in a production environment (where
the update data is at one site and the database is accessed remotely at
another of our sites) perform extremely poorly when compared to the existing
Perl approach.  For example, updating about 2500 rows in 3-4 tables takes
the Python script 25 minutes, whereas the Perl script does the same job in
10 minutes (!).  The sequence of delete/insert operations between the two
scripts is identical (designed that way for verification during the
transition).

The Perl script is using an older ODBC module (970208, by Dave Roth), and
for my Python script I'm using mxODBC (1.1.1, with mxDateTime 1.3.0).

The performance does not appear to be in the script itself - if I comment
out the "cursor.execute()" calls the Python script runs within a few percent
of the Perl script even for large data sets, and total script execution time
is on the order of seconds.  Both the Python and Perl scripts in testing are
run on the same client machine (same ODBC driver and configuration) and
against the same remote database.  So from what I can see the only real
difference is the Perl ODBC module versus mxODBC on the Python side, along
with whatever translation needs to be performed between the host script
environment and the ODBC interface.

The database operations are not really designed for efficiency, but that was
to be a later evolution - the sequence consists of processing records one by
one, deleting any prior record and inserting the new record.  That is, the
sequence of SQL operations looks something like:

   delete <table> where <key1> = <value1> and <key2> = <value2>
   insert into <table> (<columns>) values (<values>)

There are no parameter replacements - both of these commands are precomputed
strings containing all values expanded into the string, given in their
entirety to execute().  So I would think the overhead of converting them
from Python/Perl into the internal strings to pass on to the ODBC driver
would be small.

I did initially run into some problems with data values not seeming to get
into the target database initially (even when testing from the interactive
interpreter), and ended up using the "clear_auto_commit=0" parameter on the
connect() call.  I'm guessing our default database behavior is
non-transaction (still have to verify) which the Perl script - and thus the
Python version - was assuming, and the Perl module probably doesn't clear
that flag.

Would anyone happen to have any thoughts?  I'd really like to jettison the
Perl stuff (it's typical organic-growth scripts that is fairly
unmaintainable), but as it stands, it's extremely unlikely that I'd be able
to move away from the legacy scripts with this sort of wall-clock
performance hit.  But by the same token, given that it's so dramatically
different, I can't help wondering that I'm just missing something obvious?

Thanks for any help.

-- David

PS: I thought of trying a comparison against the older odbc driver in the
Win32 package just for kicks, but haven't had a chance yet.  Some very small
tests I had done previously on my local machine seemed to show it was even a
little slower than mxODBC, but I haven't tried a large data set.