Understanding search queries, semantics, and "Meaning" ...aren't we all looking for meaning?

5lvqbwl02 at sneakemail.com 5lvqbwl02 at sneakemail.com
Tue Dec 30 15:35:34 EST 2008


I have Section 4.4.1 of SICP rattling around in my head (database
queries), and I'm trying to come up with a simple dictionary-based
database in Python to represent circuit diagrams.  My main confusion
isn't one of implementation, but a matter of "big thinking",
fundamentally, about the problem. Please don't suggest using a SQL
library, as I'm looking to learn to fish, so to speak, and to learn a
bit about the biology of fish.

I've subclassed dict to hdict ("hashable dict") and rewritten the
__hash__ function so I can include a dict into a set. Thanks to a
previous poster here for providing that suggestion.

A circuit component looks like this for example:

comp1 = hdict(value=1e-6, footprint='0402', vendor='digikey')
comp2 = hdict(value=10e3, footprint='0603', vendor='mouser')
etc, etc.

The database holds the values like this:
db = dict() # normal dict
db['value'] = ([1e-6, 10e3], [comp1, comp2]) #ordered set for fast
lookup/insertion
db['footprint'] = {'0402':set[comp1], '0603':comp2} # unordered uses
normal dict for fast lookup
db['vendor'] = {'digikey':[comp1], 'mouser':[comp2]}

So basically the keys are the component parameters, and the values is
the list of components with that value.  Stuff that is comparable is
ordered; stuff that is discrete is not ordered, using either 2-tuples
or dicts, respectively.

This allows extremely fast lookup of components based on their
properties, with O(1) performance for non-ordered stuff, and O(log n)
performance for ordered stuff (using bisect methods).  The set
operations are extremely fast, so I can do AND and OR operations on
compound queries of this nature without worry.

I need this speed not so much for selecting components when the user
types in a query, but for when the mouse is hovering over the
schematic and I need things to light up underneath, if the GUI is
generating hundreds of mouse messages per second, and for inspector
windows to appear quickly when you click, etc.  If you have ever used
Altium, you can see how effective this is in terms of creating a good
interactive user experience.

My question is what happens when I choose to search for NOT
footprint='0402'.

Should this return a blank list? This happens by default., and is in
fact true: a blank list satisfies "not anything" actually.
Should this return everything that is NOT footprint 0402 (ie returns
0603 components)?  This *strongly implies* a pre-selection of *all*
objects before performing the negating function, or of looking at the
ordering of other queries in the overall search term, and then
applying NOT to the results of another query.

But I'm suspicious of a brute force preselection of all objects
whenever I see a NOT, and anyway it messes up the clean query/combiner
method of search I'm doing, and it requires an implied sequence of
search, where I'm pretty sure it should not rely on sequencing.  Even
though this is single threaded, etc., the semantics of the query
should not rely on ordering of the search term:

footprint='0402' and NOT vendor='mouser' should return the same as
NOT vendor='mouser' and footprint='0402'.

So this is my philosophical quandary.  I'm not sure what the correct
thing is.  In SICP they are using nondeterministic stuff which I don't
quite get, so it's hard to follow.  Also they are not using
dictionaries and hashes, so I'm not sure if their generate-and-test
method would work here anyway.  Generate-and-test seems extremely
inefficient.

Can a wise guru please enlighten me?

thanks
Michael



More information about the Python-list mailing list