Best database to use with Python

Thu Apr 27 20:37:27 EDT 2000

>>>>> "Tim" == Tim Lavoie <tim.lavoie at mts.net> writes:

>> to lack of transaction support), but the most startlingly fast one
>> was bulk load, which does you no good in a CGI situation. Again,
>> from memory, MySQL was fast at very simple queries, but had a very
>> limited SQL dialect and fell apart with fancier queries.

Tim> I did a work project involving a search engine of sorts, with a
Tim> crawler creeping through the data at night, and a CGI front-end
Tim> to look for one or more words. The first iteration used Postgres,
Tim> but was slow.... The crawler took forever on inserts, and since
Tim> the data changed often, there were a lot of them. Queries for
Tim> single words were acceptable, but multi-word 'OR' searches were
Tim> painfully slow. In this case, MySQL fit the bill very well; data
Tim> was added very quickly, and queries were extremely fast. Also, we
Tim> didn't need transactions or anything of the sort, since at the
Tim> very worst we'd crawl the whole works again.

I have written a mostly DBI 2.0 compliant interface to Sybase.  The
freely available 11.0.3 version of Sybase from linux.sybase.com was
used for development.

One of the things that we needed here was fast inserts of large
amounts of data, so I extended the interface to support the bulkcopy
import interface.  This allows you to do things like:

    db = Sybase.connect(server, user, password, bulkcopy = 1)
    bcp = db.bulkcopy('cli_order')
    file = open(orders_file)
    while 1:
        line = string.rstrip(file.readline())
        if not line:
            break
        bcp.rowxfer(line)

The speed of inserts is blindingly fast.  I just wrote a little test
program to show the speed.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
import time
import Sybase

start = time.time()
db = Sybase.connect('SYBASE', 'djc', 'password', bulkcopy = 1)
db.execute('create table #test (col1 varchar(80), col2 int, col3 char(20))')
bcp = db.bulkcopy('#test')
for i in xrange(100000):
    bcp.rowxfer(('a string', 20, 'another string'))
bcp.done()
end = time.time()
c = db.cursor()
c.execute('select count(*) from #test')
num, = c.fetchone()
print '%d rows in %.2f seconds, or %.0f rows / second' \
      % (num, end - start, num / (end - start))
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

The program output:

100000 rows in 8.70 seconds, or 11500 rows / second

I am running this on a PC with a P-III/450, 128M RAM, IDE disk and the
Linux 2.2 kernel.

The nice thing about the rowxfer method is that you can pass in a CSV
string which is internally split into column values.  You can also
pass a list or tuple of column values.

For the interested, the module can be found at:

        http://www.itga.com.au/~djc/sybase.html

- Dave