[Tutor] Python - SQL paradigm (Will I need a hammer to make it fit?)

Kent Johnson kent37 at tds.net
Wed Dec 14 12:58:02 CET 2005


Liam Clarke wrote:
> Hi all,
> 
> Just contemplating.
> 
> If in Python I were organising a data index along the lines of  -
> 
> j = {
> 
> "k_word1" : ["rec1","rec2","rec3","rec4"],
> ...
> "k_wordn" :["recX","rec4"]
> 
> }
> 
> and I was going to find records that matched by seeing what record
> occurred in the most lists (via set intersections or similar; going to
> have a play see what works faster) selected by searching keywords...

If I understand correctly, the problem is, given a selection of 
keywords, find the records containing any of those keywords, and show 
the records ordered by how many keywords they contain?

If all your data fits in memory this might be a good solution. You could 
pickle the sets for storage and it could be fairly fast to compile the 
counts. Something like this (not tested!):

import operator

# Build a dict of counts for each record that matches the keywords
counts = {}
for kwd in search_words:  # a list of desired keywords
   recs = j[kwd]
   for rec in recs:
     counts[rec] = counts.get(rec, 0) + 1

# Show the records sorted by count
for rec, count in sorted(counts.iteritems(), \
      key=operator.itemgetter(1)):
   print count, rec

> 
> how easily does that translate to a SQL table and query format?

The simplest thing would be to make a table of keyword, record pairs. 
Then I think you could construct a query something like this (my SQL 
reference is at work so this probably needs some tweaking):

select record, count(*) from keyword_table where
   keyword = "k_word1"
     or keyword = "k_word3"
     or <etc for all the keywords you are searching>
   group by record
   order by count(*)

> 
> If I had columns
> 
> k_word1 ... k_wordn
> rec1           recX
> rec2           rec4
> rec3
> rec4
> 
> is that even valid? Or does every row have to have a primary key?

It's bad database design. It implies a relation between rec1 and recX, 
for example, because they appear in the same row. It will only work if 
you have a fixed set of keywords. Bad, bad, bad database design ;)

> 
> I've been looking at sqlcourse.com, and I'm thinking of having a table
> for each keyword, and adding each record as a primary key on that
> table.

This reduces the database to nothing more than a bunch of sets. It 
hardcodes the keywords. It is awkward to access - you will end up doing 
a bunch of searches in each table.
> 
> Have I aroused anyone's Code Smell nose yet?

Yes.

You might be interested in this book:
http://www.aw-bc.com/catalog/academic/product/0,1144,0201703092,00.html

Kent



More information about the Tutor mailing list