[Tutor] Python - SQL paradigm (Will I need a hammer to make it fit?)
Kent Johnson
kent37 at tds.net
Wed Dec 14 12:58:02 CET 2005
Liam Clarke wrote:
> Hi all,
>
> Just contemplating.
>
> If in Python I were organising a data index along the lines of -
>
> j = {
>
> "k_word1" : ["rec1","rec2","rec3","rec4"],
> ...
> "k_wordn" :["recX","rec4"]
>
> }
>
> and I was going to find records that matched by seeing what record
> occurred in the most lists (via set intersections or similar; going to
> have a play see what works faster) selected by searching keywords...
If I understand correctly, the problem is, given a selection of
keywords, find the records containing any of those keywords, and show
the records ordered by how many keywords they contain?
If all your data fits in memory this might be a good solution. You could
pickle the sets for storage and it could be fairly fast to compile the
counts. Something like this (not tested!):
import operator
# Build a dict of counts for each record that matches the keywords
counts = {}
for kwd in search_words: # a list of desired keywords
recs = j[kwd]
for rec in recs:
counts[rec] = counts.get(rec, 0) + 1
# Show the records sorted by count
for rec, count in sorted(counts.iteritems(), \
key=operator.itemgetter(1)):
print count, rec
>
> how easily does that translate to a SQL table and query format?
The simplest thing would be to make a table of keyword, record pairs.
Then I think you could construct a query something like this (my SQL
reference is at work so this probably needs some tweaking):
select record, count(*) from keyword_table where
keyword = "k_word1"
or keyword = "k_word3"
or <etc for all the keywords you are searching>
group by record
order by count(*)
>
> If I had columns
>
> k_word1 ... k_wordn
> rec1 recX
> rec2 rec4
> rec3
> rec4
>
> is that even valid? Or does every row have to have a primary key?
It's bad database design. It implies a relation between rec1 and recX,
for example, because they appear in the same row. It will only work if
you have a fixed set of keywords. Bad, bad, bad database design ;)
>
> I've been looking at sqlcourse.com, and I'm thinking of having a table
> for each keyword, and adding each record as a primary key on that
> table.
This reduces the database to nothing more than a bunch of sets. It
hardcodes the keywords. It is awkward to access - you will end up doing
a bunch of searches in each table.
>
> Have I aroused anyone's Code Smell nose yet?
Yes.
You might be interested in this book:
http://www.aw-bc.com/catalog/academic/product/0,1144,0201703092,00.html
Kent
More information about the Tutor
mailing list