My Python/MySQL code is too slow

Dennis Peterson denpeterson at yahoo.com
Wed Jun 19 20:48:47 EDT 2002


I'm new to python and mysql specifically, but...your join's going to be slow
if you don't have an index on var1,var2,var3 in both tables. But one of the
tables is a temporary, no indexes, so if it's big, you have a problem.

I don't know if you can apply indexes to a temporary table in mysql - if
not, you could use a regular table, just build it like your temp, add an
index, run your join, that'll probably be faster. If this would be a
concurrency problem, add a userid to the key for your "temporary."

Also, I don't know for sure that mysql optimizes this way, but in some
databases at least it's best if you put your most selective column first in
the index. If there are 5 distinct var1 values, and 1000 distinct var3, make
your index var3,var2,var1. Probably you want it in the same order, both
tables.

There might be a better way, though. Same index hints apply, but if dbB has
a unique index on var1,var2,var3, you could just do this:

SELECT dbB.var1, dbB.var2, dbB.var3 FROM dbB inner join dbA
on dbB.var1=dbA.var1 AND dbB.var2=dbA.var2 AND dbB.var3=dbA.var3
group by dbB.var1,dbB.var2,dbB.var3 having count(*)=2

Hope this helps,
Dennis

"Duncan Smith" <buzzard at urubu.freeserve.co.uk> wrote in message
news:aer2s4$mp7$1 at news7.svr.pol.co.uk...
> I have already posted this problem on mailing.database.mysql as I see it
as
> a problem with my lack of knowledge of SQL.  But no responses so far, and
> maybe Python could do more of the work.  I repost the original question
> below (SQL easier to follow), and the Python code below that.  Any help
> appreciated.  TIA.
>
> ---------------------------------------------------------------------
> I have two tables (say dbA and dbB) and need to find the number of rows in
> dbB which have exactly 2 matching rows in dbA (on the selected columns).
> I'm using Python/MySQL, but the relevant SQL is below.  This is slow, and
> I'm sure it's possible without creating the temporary table and without
the
> join.  But everything I try fails (because of my limited knowledge of
SQL).
> Can anyone show me an efficient way of getting the answer?  All I need is
> the number of rows in dbB which have exactly 2 matching rows in dbA.
Thanks
> in advance.
>
> CREATE TEMPORARY TABLE tmp SELECT var1, var2, var3 FROM dbA GROUP BY var1,
> var2, var3 HAVING COUNT(*)=2;
>
> SELECT dbB.var1, dbB.var2, dbB.var3, tmp.var1, tmp.var2, tmp.var3 FROM
dbB,
> tmp WHERE dbB.var1=tmp.var1 AND dbB.var2=tmp.var2 AND dbB.var3=tmp.var3;
>
> Cheers.
>
> Duncan Smith
>
> ------------------------------------------------------------------
> import sys
> import string
> import MySQLdb
>
> def MyFunc(dbname, tblname, pert_table, variables=()):
>
>     """'dbname' is a MySQL database containing tables
>     'tblname' and 'pert_table'.  'variables' is a
>     sequence listing the relevant columns"""
>
>     variables = list(variables)
>
>     try:
>         conn = MySQLdb.connect()
>         curs = conn.cursor()
>         curs.execute('USE %s' % (dbname,))
>         if not variables:
>             curs.execute('DESCRIBE %s' % (tblname,))
>             variables = [x[0] for x in list(curs.fetchall())]
>         vars = string.join(variables, ', ')
>
>         #produce a temporary table containing all records
>         #of tblname which occur in pairs and find P
>
>         curs.execute("""CREATE TEMPORARY TABLE tmp
>                     SELECT %s FROM %s GROUP BY %s
>                     HAVING COUNT(*)=2""" % (vars, tblname, vars))
>
>         def f(s): return string.join([s + var for var in variables], ', ')
>
>         P = curs.execute("""SELECT %s, %s FROM %s, tmp WHERE %s""" %
> (f(pert_table+'.'), f('tmp.'), pert_table, string.join([pert_table + '.' +
> variables[i] + '=' 'tmp.' + variables[i] for i in range(len(variables))],
'
> AND ')))
>
>         curs.execute('DROP TABLE tmp')
>
>         curs.close()
>         conn.close()
>
>         return P/2
>
>     except MySQLdb.Error, e:
>         print "Error %d: %s" % (e.args[0], e.args[1])
>         sys.exit(1)
>
>





More information about the Python-list mailing list