SQL/Python question -- slow... What is the fixed cost?

Thu Oct 4 23:43:34 EDT 2001

"Leonardo B Lopes" <leo at iems.nwu.edu> wrote in message
news:3BBCF0D0.29A3CB9E at iems.nwu.edu...
> Me DB is in 3NF. And I know that 83K isn't that big, but it is big
> enough to make a query last .0008 cpusec. when joined w/ 2 other tables
> (That is the actual amount, before rounding). The problem is that a
> query on a table with only 12 items also takes about the same time. So I
> need to know how much time the library is using for data checking,
> network connection, etc...
>
> Actually today I created the following db:
>
> CREATE database ATOM
> CREATE TABLE atomic (id MEDIUMINT PRIMARY KEY)
> INSERT INTO atomic (id) VALUES (3)
>
> and the following python code:
>
>       1 #!/usr/bin/env python
>       2
>       3 import MySQLdb
>       4 import profile
>       5
>       6 db = MySQLdb.connect(passwd='Iwanit2')
>       7 c = db.cursor()
>       8
>       9 def test():
>      10     for i in range(0,10000):
>      11         c.execute('select * from atomic where id=3')
>      12         x = c.fetchall()
>      13
>      14 c.execute('use atom')
>      15 profile.run('test()')
>
> and here are the (best) results:
>
>    ncalls  tottime  percall  cumtime  percall filename:lineno(function)
>         1    0.000    0.000    5.250    5.250 <string>:1(?)
>     10000    1.600    0.000    2.860    0.000 cursors.py:110(__do_query)
>     10000    0.210    0.000    0.210    0.000 cursors.py:137(_fetch_row)
>     10000    0.390    0.000    0.390    0.000
> cursors.py:147(_check_for_warnings)
>     10000    0.400    0.000    0.690    0.000
> cursors.py:162(_get_result)
>     10000    0.530    0.000    3.600    0.000 cursors.py:169(_query)
>     10000    0.310    0.000    0.420    0.000 cursors.py:193(fetchall)
>     10000    0.110    0.000    0.110    0.000
> cursors.py:36(_check_executed)
>     30000    0.690    0.000    0.690    0.000 cursors.py:46(_get_db)
>     10000    0.660    0.000    4.480    0.000 cursors.py:51(execute)
>         0    0.000             0.000          profile:0(profiler)
>         1    0.010    0.010    5.260    5.260 profile:0(test())
>         1    0.350    0.350    5.250    5.250 testdb.py:10(test)
>
> Apparently, .0004 is a Lower Bound on how fast a query can be, and that
> is not good news for me.
>
> Now with this small change to the script:
>
>       1 #!/usr/bin/env python
>       2
>       3 import MySQLdb
>       4 import profile
>       5
>       6 db = MySQLdb.connect(passwd='Iwanit2')
>       7 c = db.cursor()
>       8
>       9 def test():
>      10     for i in range(0,10000):
> ***  11         c.execute('select * from mematomic where id=3')
>      12         x = c.fetchall()
>      13
>      14 c.execute('use atom')
> ***  15 c.execute(' CREATE TABLE mematomic (id MEDIUMINT PRIMARY KEY) \
> ***  16             TYPE = HEAP')
> ***  17 c.execute(' INSERT INTO mematomic (id) VALUES (3)')
>      18 profile.run('test()')
>      19
>
> and the new results:
>
>    ncalls  tottime  percall  cumtime  percall filename:lineno(function)
>         1    0.000    0.000    5.920    5.920 <string>:1(?)
>     10000    1.440    0.000    2.910    0.000 cursors.py:110(__do_query)
>     10000    0.200    0.000    0.200    0.000 cursors.py:137(_fetch_row)
>     10000    0.520    0.000    0.520    0.000
> cursors.py:147(_check_for_warnings)
>     10000    0.600    0.000    0.760    0.000
> cursors.py:162(_get_result)
>     10000    0.650    0.000    3.760    0.000 cursors.py:169(_query)
>     10000    0.380    0.000    0.500    0.000 cursors.py:193(fetchall)
>     10000    0.120    0.000    0.120    0.000
> cursors.py:36(_check_executed)
>     30000    0.560    0.000    0.560    0.000 cursors.py:46(_get_db)
>     10000    1.000    0.000    4.970    0.000 cursors.py:51(execute)
>         0    0.000             0.000          profile:0(profiler)
>         1    0.010    0.010    5.930    5.930 profile:0(test())
>         1    0.450    0.450    5.920    5.920 testdb.py:9(test)
>
> So even with a temporary table always in memory, the overhead hasn't
> changed much. Comments?
>
I think about the only comment I'm prepared to make is that if 400
microseconds isn't good enough retrievall performance you should not be
using relational technology at all. When I suggested using memory, I didn't
mean you should use memory-based relations, I meant you should use Python
data structures.

If performance is so critical, relational structures simply aren't
appropriate when they don't give adequate performance!

regards
 Steve
--
http://www.holdenweb.com/