relation class

Sun Apr 26 05:06:51 EDT 2009

On Apr 24, 1:18 am, Aaron Brady <castiro... at gmail.com> wrote:
> On Apr 22, 11:34 pm, Aaron Brady <castiro... at gmail.com> wrote:
>
> > On Apr 22, 11:52 am, Aaron Brady <castiro... at gmail.com> wrote:
>
> > > On Apr 22, 12:09 am, Chris Rebert <c... at rebertia.com> wrote:
>
> > > > On Tue, Apr 21, 2009 at 5:51 PM, Aaron Brady <castiro... at gmail.com> wrote:
> > > > > Hi all,
>
> > > > > I think Python should have a relation class in the standard library.
> > > > > Fat chance.
>
> > > > Perhaps I'm not understanding "relation" correctly, but are you not
> > > > aware ofhttp://docs.python.org/library/sqlite3.html?
>
> > > > Cheers,
> > > > Chris
> > > > --
> > > > I have a blog:http://blog.rebertia.com
> > snip
> > > It only supports numbers and strings.
>
> > snip
>
> > My point is that in undirected relations, it's redundant to maintain
> > reciprocal membership.
> snip
> > Here is another sample:
>
> > Tree= Relation( ( "parent", "child", "direction" ) )
> > nodes= [ object( ) for _ in range( 10 ) ]
> > Tree( ( nodes[ 0 ], nodes[ 1 ], "left" ) )
> > Tree( ( nodes[ 0 ], nodes[ 2 ], "right" ) )
> snip
> > 'select' would need the context
> > of its caller, which isn't available.
>
> snip
>
> Or, pass 'locals()'.
snip
>
> > I think some kind of markers would have to replace any identifiers it
> > would use:
>
> > recordset= Tree.select( [ "child" ],
> >     "parent is %var and direction=='left'", nodes[0] )
>
> snip
snip
> class TreeNode:
>     relation= Tree
>     left= Relation.getter(
>         "child", "parent is ? and direction=='left'" )
snip
> It only allows one '?' in the
> definition, since getter functions only take one argument: self!

This isn't true.  The query could have multiple parameter arguments.
Since 'self' isn't defined when calling the property creation
function, the question mark will need a special flag, or an argument
will have to be a sentinel object.

    left= Tree.getter(
        "child", "parent is ?self and direction==?", "left" )

Or:

    left= Tree.getter(
        "child", "parent is ? and direction==?", self_mark, "left" )

On doing my homework, I find parameters can be named as well:

    left= Tree.getter(
        "child", "parent is :self and direction==:dir",
        { 'dir': "left" } )

'self' would be reserved.

snip
> You might want 'getter', 'getiter', and 'getone' methods, which return
> a set of, an iterator over, and just the field of an arbitrary one of
> the matching records respectively.
snip

> > > > I want to return an
> > > > iterator from the following function calls.

It isn't clear that a bare iterator is the proper tool.  Taking
'columns' to be fields, and 'rows' to be records, we have four cases
of the return from a select statement.

- One row, one column.  'getter' and 'setter' can take a single-object
value.  'deller' merely deletes the record.
- One row, multiple columns.  'getter' and 'setter' return and accept
dictionaries which specify the values of the columns in the given
record.  'deller' deletes the record.  It's possible to create nested
properties which correspond to the unspecified fields, but you may
lose identifier fidelity.  Is a dictionary prohibitively inconvenient?
- Multiple rows, one column.  The return from 'SELECT' should have
iterator semantics, as well as set operations.  'getter' returns the
object, but no 'setter' or 'deller' descriptors are meaningful.  The
'add' method on the iterator/set should insert a new record that
fulfills the rest of the statement along with the value added; there
is one degree of freedom in the query.  'discard', 'remove', 'pop',
etc., should all do analogous things.  The methods may have to be
implemented from scratch, due to the fact that they are short-hand for
operations on the relation with larger tuples, as well as
corresponding entries in the hashes and trees.
- Multiple rows, multiple columns.  The return from 'SELECT' should be
an iterator/set.  Arguments to the set operations should be
dictionaries as above.

The 'children' descriptor of the 'Tree' relation corresponds to the
last case:

class TreeNode:
    children= Tree.getter(
        "child", "parent is ?self" )

A call to 'list( nodeA.children )' would return (simulated,
formatted):
[
  { 'child': nodeB, 'direction': 'left' },
  { 'child': nodeC, 'direction': 'right' }
]

'nodeA.children' would be a non-hashing set, which set's members would
be dictionaries.  The dictionaries would either have to be read-only,
or specialized to pass changes to them on to the structure.  Two
sample operations on the 'children' iterator/sets:

nodeA.children.discard( { 'child': nodeB, 'direction': 'left' } )

To merely discard the left node or the 'nodeB' node would be better
suited to a different property or query ('del nodeA.left').  I don't
expect that the set operations would be used much in the 'multi-
column, multi-row' case; they could be susceptible to abuse, and may
be better omitted.

nodeA.children.clear( )

This removes both children from 'nodeA'.  It may be eliminated from
the relation by now, though the get descriptor 'nodeA.children' will
still return an empty iterator/set object.  Should it be the same one,
or volatile, like bound method objects?

The mention of 'hashes and trees' reminds me that not all types are
comparable in Python 3.  This means each disjoint set of comparable
types will need its own separate tree, and the object's hash entry may
need to indicate privately what tree it's in.  As you can see, columns
are freed from their static typing as SQL mandates they have.  A
statement like, "SELECT * FROM Table WHERE X>100 OR X>objectA" would
perform two tree queries, one on the tree the elements of which can be
compared to 100, and the other on that which can be compared to
'objectA'.  This is an optimization.

I am still operating on the premise that selection ('WHERE') clauses
must be simplified expressions, to permit the relation to run a fast
query.  Perhaps it could be combined with special filters, such as a
function call.

    prop= SomeRel.getter(
        "fieldA", "fieldA> 600 and ?( ?self )", funcA )

Here, fieldA is compared first in a binary search, then 'funcA' would
be called on each remaining record.  What are your thoughts?  Should
the optimization break left-to-right precedence?  That is, should
'fieldA' be filtered first, or 'funcA' called first if their order is
reversed?

    prop= SomeRel.getter(
        "fieldA", "?( ?self ) and fieldA> 600", funcA )

One obstacle is that '?' can't be filtered generically.  The
alternatives are to use designated identifiers, and consequently
invalidate some field names.  '?' could just be passed as a parameter
where needed [1].  Or, only inject an argument when the syntax takes
an identifier, that is, outside quotes.

[1] http://www.sqlite.org/c3ref/bind_parameter_name.html
http://www.sqlite.org/c3ref/bind_blob.html

Another obstacle is that where dictionaries and other mutable types
are present in a relation, the clauses 'WHERE X=={...}' and 'WHERE X
is {...}' would produce different results.  (The notation is shorthand
for "select( 'X==?', {...} )".)  The potential for objects to change
their values throws a wrench into their participation in hashes for
equality and identity, and trees for inequality.  Not all objects can
participate in the optimization, and searches are back to good old
linear.  It's not clear how to detect this on the fly: references to
members of the relation can exist outside its control.  Several
strategies are merely to 'void the warranty' when the program mutates
participants; offer a 'update( ob= None )' method which must be called
upon mutations; offer additional arguments to 'add' methods, including
'__call__', which indicate whether the objects can be hashed and
'treed'; offer a 'unmutable' 'object decorator', e.g. "Tree( mutable
( nodeA ), immutable( nodeB ), 'left' )"; or follow the lead of,
oddly, the '__hash__' method and the 'bisect' module, which leave it
to the programmer to define an appropriate hash codes "...correctly
identified as unhashable...", and make no warranties about changing
contents once present in a container.  That is, any item can
potentially participate in 'bisect', and the programmer can break
sorted order by mutating the object, and shouldn't use an inequality
'select' if s/he does.  (In other words: The Usual(tm)!)