SQL/Python question -- slow... What is the fixed cost?

Thu Oct 4 19:29:20 EDT 2001

Me DB is in 3NF. And I know that 83K isn't that big, but it is big
enough to make a query last .0008 cpusec. when joined w/ 2 other tables
(That is the actual amount, before rounding). The problem is that a
query on a table with only 12 items also takes about the same time. So I
need to know how much time the library is using for data checking,
network connection, etc...

Actually today I created the following db:

CREATE database ATOM
CREATE TABLE atomic (id MEDIUMINT PRIMARY KEY)
INSERT INTO atomic (id) VALUES (3)

and the following python code:

      1 #!/usr/bin/env python
      2 
      3 import MySQLdb
      4 import profile
      5 
      6 db = MySQLdb.connect(passwd='Iwanit2')
      7 c = db.cursor()
      8 
      9 def test():
     10     for i in range(0,10000):
     11         c.execute('select * from atomic where id=3')
     12         x = c.fetchall()
     13
     14 c.execute('use atom')
     15 profile.run('test()')

and here are the (best) results:

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    5.250    5.250 <string>:1(?)
    10000    1.600    0.000    2.860    0.000 cursors.py:110(__do_query)
    10000    0.210    0.000    0.210    0.000 cursors.py:137(_fetch_row)
    10000    0.390    0.000    0.390    0.000
cursors.py:147(_check_for_warnings)
    10000    0.400    0.000    0.690    0.000
cursors.py:162(_get_result)
    10000    0.530    0.000    3.600    0.000 cursors.py:169(_query)
    10000    0.310    0.000    0.420    0.000 cursors.py:193(fetchall)
    10000    0.110    0.000    0.110    0.000
cursors.py:36(_check_executed)
    30000    0.690    0.000    0.690    0.000 cursors.py:46(_get_db)
    10000    0.660    0.000    4.480    0.000 cursors.py:51(execute)
        0    0.000             0.000          profile:0(profiler)
        1    0.010    0.010    5.260    5.260 profile:0(test())
        1    0.350    0.350    5.250    5.250 testdb.py:10(test)

Apparently, .0004 is a Lower Bound on how fast a query can be, and that
is not good news for me.

Now with this small change to the script:

      1 #!/usr/bin/env python
      2 
      3 import MySQLdb
      4 import profile
      5 
      6 db = MySQLdb.connect(passwd='Iwanit2')
      7 c = db.cursor()
      8 
      9 def test():
     10     for i in range(0,10000):
***  11         c.execute('select * from mematomic where id=3')
     12         x = c.fetchall()
     13 
     14 c.execute('use atom')
***  15 c.execute(' CREATE TABLE mematomic (id MEDIUMINT PRIMARY KEY) \
***  16             TYPE = HEAP')
***  17 c.execute(' INSERT INTO mematomic (id) VALUES (3)')
     18 profile.run('test()')
     19 

and the new results:

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    5.920    5.920 <string>:1(?)
    10000    1.440    0.000    2.910    0.000 cursors.py:110(__do_query)
    10000    0.200    0.000    0.200    0.000 cursors.py:137(_fetch_row)
    10000    0.520    0.000    0.520    0.000
cursors.py:147(_check_for_warnings)
    10000    0.600    0.000    0.760    0.000
cursors.py:162(_get_result)
    10000    0.650    0.000    3.760    0.000 cursors.py:169(_query)
    10000    0.380    0.000    0.500    0.000 cursors.py:193(fetchall)
    10000    0.120    0.000    0.120    0.000
cursors.py:36(_check_executed)
    30000    0.560    0.000    0.560    0.000 cursors.py:46(_get_db)
    10000    1.000    0.000    4.970    0.000 cursors.py:51(execute)
        0    0.000             0.000          profile:0(profiler)
        1    0.010    0.010    5.930    5.930 profile:0(test())
        1    0.450    0.450    5.920    5.920 testdb.py:9(test)

So even with a temporary table always in memory, the overhead hasn't
changed much. Comments?

Steve Holden wrote:
> 
> "Leonardo B Lopes" <leo at iems.nwu.edu> wrote in message
> news:3BBC8E99.56BD59FB at iems.nwu.edu...
> > Thanks Mark! Actually I am doing my best to keep my system independent
> > of mysql. And my db is quite particular: It is only written to once.
> > After that, it is only queried, and even so in a quite particular way.
> > But my db is pretty big. This instance, for example, has a table with
> > 83000+ records. It is not so much that the query joining that table with
> > 2 others takes .001 cpusec that worries me. That is OK. The problem is
> > that some other pretty simple queries also take the same time. If that
> > is always the case, then I will have to find another quite more
> > sophisticated solution for data storage, and that is what I am trying to
> > avoid.
> >
> Leo:
> 
> The best starting point for efficient database performance is a clean,
> normalized database design. I take it your database is structured in third
> normal form?
> 
> If you are *absolutely sure* that the data will not change (which I take to
> be the implication of your "only written to once") then you might
> permissibly deviate from normalized structures solely to improve query
> performance.
> 
> However, 83,000 rows is not by any stretch of the imagination a large
> database. If you could outline your DB structures and the usage, maybe you
> would get more assistance.
> 
> The particular database module need not be of concern. If you wanted, you
> could use Gadfly (although it uses depracated libraries, it does still work)
> and keep everything in memory!
> 
> regards
>  Steve
> --
> http://www.holdenweb.com/

-- 
=======================================================================
Leonardo B. Lopes                                      leo at iems.nwu.edu 
Ph.D. Student                                             (847)491-8470
IEMS - Northwestern University             http://www.iems.nwu.edu/~leo