SQL/Python question -- slow... What is the fixed cost?
Leonardo B Lopes
leo at iems.nwu.edu
Thu Oct 4 19:29:20 EDT 2001
Me DB is in 3NF. And I know that 83K isn't that big, but it is big
enough to make a query last .0008 cpusec. when joined w/ 2 other tables
(That is the actual amount, before rounding). The problem is that a
query on a table with only 12 items also takes about the same time. So I
need to know how much time the library is using for data checking,
network connection, etc...
Actually today I created the following db:
CREATE database ATOM
CREATE TABLE atomic (id MEDIUMINT PRIMARY KEY)
INSERT INTO atomic (id) VALUES (3)
and the following python code:
1 #!/usr/bin/env python
2
3 import MySQLdb
4 import profile
5
6 db = MySQLdb.connect(passwd='Iwanit2')
7 c = db.cursor()
8
9 def test():
10 for i in range(0,10000):
11 c.execute('select * from atomic where id=3')
12 x = c.fetchall()
13
14 c.execute('use atom')
15 profile.run('test()')
and here are the (best) results:
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 5.250 5.250 <string>:1(?)
10000 1.600 0.000 2.860 0.000 cursors.py:110(__do_query)
10000 0.210 0.000 0.210 0.000 cursors.py:137(_fetch_row)
10000 0.390 0.000 0.390 0.000
cursors.py:147(_check_for_warnings)
10000 0.400 0.000 0.690 0.000
cursors.py:162(_get_result)
10000 0.530 0.000 3.600 0.000 cursors.py:169(_query)
10000 0.310 0.000 0.420 0.000 cursors.py:193(fetchall)
10000 0.110 0.000 0.110 0.000
cursors.py:36(_check_executed)
30000 0.690 0.000 0.690 0.000 cursors.py:46(_get_db)
10000 0.660 0.000 4.480 0.000 cursors.py:51(execute)
0 0.000 0.000 profile:0(profiler)
1 0.010 0.010 5.260 5.260 profile:0(test())
1 0.350 0.350 5.250 5.250 testdb.py:10(test)
Apparently, .0004 is a Lower Bound on how fast a query can be, and that
is not good news for me.
Now with this small change to the script:
1 #!/usr/bin/env python
2
3 import MySQLdb
4 import profile
5
6 db = MySQLdb.connect(passwd='Iwanit2')
7 c = db.cursor()
8
9 def test():
10 for i in range(0,10000):
*** 11 c.execute('select * from mematomic where id=3')
12 x = c.fetchall()
13
14 c.execute('use atom')
*** 15 c.execute(' CREATE TABLE mematomic (id MEDIUMINT PRIMARY KEY) \
*** 16 TYPE = HEAP')
*** 17 c.execute(' INSERT INTO mematomic (id) VALUES (3)')
18 profile.run('test()')
19
and the new results:
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 5.920 5.920 <string>:1(?)
10000 1.440 0.000 2.910 0.000 cursors.py:110(__do_query)
10000 0.200 0.000 0.200 0.000 cursors.py:137(_fetch_row)
10000 0.520 0.000 0.520 0.000
cursors.py:147(_check_for_warnings)
10000 0.600 0.000 0.760 0.000
cursors.py:162(_get_result)
10000 0.650 0.000 3.760 0.000 cursors.py:169(_query)
10000 0.380 0.000 0.500 0.000 cursors.py:193(fetchall)
10000 0.120 0.000 0.120 0.000
cursors.py:36(_check_executed)
30000 0.560 0.000 0.560 0.000 cursors.py:46(_get_db)
10000 1.000 0.000 4.970 0.000 cursors.py:51(execute)
0 0.000 0.000 profile:0(profiler)
1 0.010 0.010 5.930 5.930 profile:0(test())
1 0.450 0.450 5.920 5.920 testdb.py:9(test)
So even with a temporary table always in memory, the overhead hasn't
changed much. Comments?
Steve Holden wrote:
>
> "Leonardo B Lopes" <leo at iems.nwu.edu> wrote in message
> news:3BBC8E99.56BD59FB at iems.nwu.edu...
> > Thanks Mark! Actually I am doing my best to keep my system independent
> > of mysql. And my db is quite particular: It is only written to once.
> > After that, it is only queried, and even so in a quite particular way.
> > But my db is pretty big. This instance, for example, has a table with
> > 83000+ records. It is not so much that the query joining that table with
> > 2 others takes .001 cpusec that worries me. That is OK. The problem is
> > that some other pretty simple queries also take the same time. If that
> > is always the case, then I will have to find another quite more
> > sophisticated solution for data storage, and that is what I am trying to
> > avoid.
> >
> Leo:
>
> The best starting point for efficient database performance is a clean,
> normalized database design. I take it your database is structured in third
> normal form?
>
> If you are *absolutely sure* that the data will not change (which I take to
> be the implication of your "only written to once") then you might
> permissibly deviate from normalized structures solely to improve query
> performance.
>
> However, 83,000 rows is not by any stretch of the imagination a large
> database. If you could outline your DB structures and the usage, maybe you
> would get more assistance.
>
> The particular database module need not be of concern. If you wanted, you
> could use Gadfly (although it uses depracated libraries, it does still work)
> and keep everything in memory!
>
> regards
> Steve
> --
> http://www.holdenweb.com/
--
=======================================================================
Leonardo B. Lopes leo at iems.nwu.edu
Ph.D. Student (847)491-8470
IEMS - Northwestern University http://www.iems.nwu.edu/~leo
More information about the Python-list
mailing list