Dictionary used to build a Triple Store

Steve Holden steve at holdenweb.com
Thu Jan 7 13:54:05 EST 2010


Lee wrote:
> Definitely a newbie question, so please bear with  me.
> 
> I'm reading "Programming the Semantic Web" by Segaran, Evans, and Tayor.
> 
> It's about the Semantic Web BUT it uses python to build a "toy" triple
> store claimed to have good performance in the "tens of thousands" of
> triples.
> 
> Just in case anybody doesnt know what an RDF triple is (not that it
> matters for my question) think of it as an ordered 3 tuple representing
> a Subject, a Predicate, and an Object eg: (John, loves, Mary) (Mary,
> has-a, lamb) {theSky, has-color,blue}
> 
> To build the triple store entirely in Python, the authors recommend
> using the Python hash. Three hashes actually (I get that. You
> want to have a hash with the major index being the Subject in one hash,
> the Predicate in another hash, or the Object for the third hash)
> 
> He creates a class SimpleGraph which initializes itself by setting the
> three hashes names _spo, _pos, and _osp thus
> 
> class SimpleGraph;
>   def __init__(self);
>      self._spo={};
>      self._pos=();
>      self._osp={};
> 
> So far so good. I get the convention with the double underbars for the
> initializer but
> 
> Q1: Not the main question but while I'm here....I'm a little fuzzy on
> the convention about the use of the single underbar in the definition of
> the hashes. Id the idea to "underbar" all objects and methods that
> belong to the class? Why do that?
> 
> But now the good stuff:
> 
> Our authors define the hashes thus: (showing only one of the three
> hashes because they're all the same idea)
> 
> self._pos = {predicate:{object:set( [subject] ) }}
> 
> Q2: Wha? Two surprises ...
>    1) Why not {predicate:{object:subject}} i.e.
> pos[predicate][object]=subject....why the set( [object] ) construct?
> putting the object into a list and turning the list into a set to be the
> "value" part of a name:value pair. Why not just use the naked subject
> for the value?
> 
Because for a given predicate there can be many objects, and you need to
be able to look up the subjects associated with the same object and
predicate. (I am assuming this is initialization code: to add another
subject with the same object to the predicate you would use

    self._pos[predicate][object].add(subject)

>    2) Why not something like pos[predicate][object][subject] = 1 .....or
> any constant. The idea being to create the set of three indexes. If the
> triple exists in the hash, its "in" your tripple store. If not, then
> there's no such triple.
> 
Because it's less efficient. Since there will only ever be one unique
occurrence of each (predicate, object, subject) triple using a dict
would be unnecessarily wasteful. Containment checks for sets are just as
fast as for dicts, asn you don't need to store all those references to 1.

regards
 Steve
-- 
Steve Holden           +1 571 484 6266   +1 800 494 3119
PyCon is coming! Atlanta, Feb 2010  http://us.pycon.org/
Holden Web LLC                 http://www.holdenweb.com/
UPCOMING EVENTS:        http://holdenweb.eventbrite.com/




More information about the Python-list mailing list