Pointrel Data Repository System for Python

Paul Fernhout pdfernhout at kurtz-fernhout.com
Fri May 26 00:04:37 EDT 2000


Shae-

Thanks for the great comments!

Shae Erisson wrote:
> 
> Paul Fernhout wrote:
> >
> > This is the very first (rough!) version of the Pointrel Data Repository
> > System.
> 
> This sounds like a three-way linked list with indexes. Am I close?
> The only difference I see is that a linked list is generally thought of
> as a linear construct whereas this looks more like spaghetti.

Very insightful and concise, with one minor quibble that "list" might
more accurately be "lists".

Yes, I guess that might be in a way a technically accurate description,
depending on how broadly one interprets "lists". Obviously Lisp heavily
uses lists, and it can do a lot!

Like you said, "list" would to me commonly mean something that was
intended to be primarily sequential. The Pointrel system allows one to
support arbitrary structures. Obviously Lisp supports structures too,
made of list elements, and in practice Lisp lists made of two-element
pairs are often used to build trees to define structures.

Also, "list" implies to me perhaps one list. When using the Pointrel
system, you can have multiple structures defined by triads which are not
connected in any way, but which you can navigate through if you know the
location of an item in the structure. In practice you might anchor these
to a common root with some symbolic names to find what you stored again
using a text key; so in that way most data might be seen as connected,
and in that sense a "list" if you allow "list" to include something that
is more like a tree or a web.

Also, "indexes" is probably appropriate, but doesn't quite catch the
flavor of being able to easily find all the connections to any triad in
the system and the implications in terms of transparency of the system.

This implementation of the Pointrel system supports creating and
traversing webs of nodes (called triads). The nodes can have with from
zero up to three ordered (A,B,C) links each. Each node can have an
arbitrary binary string associated with it. One can always easily find
all the nodes that have linked to any specific node from any particular
ordered slot (A,B,C). 

So, could this be called more or less supporting "three-way linked
list(s) with indexes"? I guess so. 

Could it be called "spaghetti"? I guess so too (depending on how you
link up the triads in your repository)!

> How is this different from something like Gadfly[1]?
> [1] http://www.chordate.com/gadfly.html

Here's an analogy. Imagine having a hundred and twenty-two thousand
miles of string. :-)

Having it all in one piece is like having a relational database
management system (RDBMS), like Gadfly. You can cut that string into
many strings of whatever length you want, and then use those lengths of
string for tying up all sorts of things, using as a leash for your pet
cat Eric (if you've got the proper license from the man in the van), or
representing weights and measures, and so on. Everyone knows the value
of a long continuous piece of string. It's a bit dangerous to keep
scissors around for cutting the string of course, but the latest
scissors the nice people give out :-) have blunt edges so it's not too
bad. It is a pain to keep hauling around a mountain of string though,
but you do it because once you cut off a length, it's hard to reuse it,
and you want to get the most efficiency out of all that precious string.

Now imagine chopping up that string into three-inch lengths. 
That's the Pointrel Data Repository System. :-) 

Now, before one calls this "bad planning", consider the following:

Fold over the string and tie a knot in the middle.

This wil produce something that looks vaguely like this asciii art:

     ------
====K
     ------

These super water absorbent triadal "stringettes" can now be easily tied
together to make all sorts of interesting webs without requiring any
additional cutting. So, the nice people don't have to worry about
providing dangerous sharp devices. You never has to make a big
commitment to cutting a specific length of string, and you can easily
carry around just a small pile of stringettes and get more if you run
out. You never have to worry about keeping track of unusual lengths of
string cut to useless lengths due to bad planning -- since all your
stringettes are a uniform size, which makes life a lot easier all
around! :-)

What's more, with a microprint pen, one can write on the knot, thus
saving the trouble of encoding text via more stringettes tied to the
knot.

Further, if you assume a non-euclidian infinite-dimentional space where
all string knots are at most apparently less than 1.5 inches apart, you
can tie the end of any stringette to the knot of any other stringette,
allowing your stringettes to represent arbitrary structures of
information.

Stringettes for everyone!

And now for something completely different....

Gadfly is a great system. I had used it briefly as part of learning to
use SQL under Zope. 

As I understand it, Gadfly is basically a relational database
implemented in Python with support for a large subset of SQL. It loads
tables into memory before operating on them, which has various
implications.

So, the question is perhaps more, "how is the Pointrel system different
from a typical relational database"? William Kent discusses issues like
this at length in his book "Data & Reality".
It is in large part an issue of conceptual emphasis.

Let me try to explain this difference by giving my take on what types of
activities are assumed to be "common", "occasional", or "rare" in the
two types of systems.

Relational databases are designed for storing large numbers of similar
"relations" (table rows). Changing the types of things stored in a
relational database is a rare event. "Introspection" of database
elements referring to other database elements is also rare. Changing a
value in a relation is common event, and adding new data is an
occasional event [might be common in some cases]. Processes that need to
analyze a significant portion of the data in the database are rare.
Deleting data is done occasionally.

The Pointrel Data Repository System is designed for storing large
numbers of ad-hoc relations, where changing the types of things stored
in the database is a common event (but you still want to get at legacy
data), "introspection" of database elements referring to other elements
is also common, and changing a value in a relation is an occasional
event, and adding new data is a common event. Processes that need to
analyze a significant portion of the data in the database are common.
Deleting data is done rarely.

Summary chart:
      
Activity            RDBMS       Pointrel System
===============================================
Changing Types      Rare        Common
Introspection       Rare        Common
Changing Values     Common      Occasional 
Adding data         Occasional  Common
Analyze Lots        Rare        Common
Deletion            Occasional  Rare

Some other differences:
* The Pointrel system puts all data in one file (well, really two);
typically a RDBMS use one file per table.
* The Pointrel system never deletes; a typical RDBMS allows deletion
(which may require packing). [Deletion can be approximated in the
Pointrel System, but only by marking triads deleted and checking for
that mark at the application level, creating an alternate structure
including only the non-deleted data and working exclusively with the
alternate, or cloning the repository without the deleted data].
* The Pointrel system has an absolute minimum of base concepts (triad
and string); a RDBMS typically has many base concepts to learn (like
table restructuring, allowable data types, indexing methods, etc.).
Succesfully using the Pointrel system does require learning many new
concepts, but those are more at the application level (i.e. how to
represent and retrieve application specific data in useful ways.)
* The Pointrel system uses triads to allow arbitrary higher relational
layers to be built; a RDBMS typically implements relations as arrays of
values in files.
* The Pointrel database makes it easy and fast to dive recursively into
relations; a RDBMS with SQL typically is not designed to support queries
where one is taking a value from a row (relation) and using that value
to drill down into rows (relations) of the same table (type). [You can
do it, but I not easily.]
* The Pointrel system leans towards supporting information retrieval
where the information retrieved may change what additional information
is retrieved and what it means (by encouraging fine grained traversal);
a RDBMs with SQL leans towards supporting information retrieval where
the major variable is how many records might come back and you know what
the structure of the query result will be (by what SQL's syntax makes
convenient).
* The Pointrel system could easily (not implemented yet) support doing
reports on a consistent view of the database by ignoring all triads
after a certain location; a RDBMS has to work hard to provide a
consistent view (if it can) given all the various tables and such which
may be undergoing changes while the report is running.

The core Python source file for the Pointrel System is only 20K, but I
hope you'll agree it can do quite a bit for such a small size and thus
has a high usefulness per byte ratio. Of course, it can't quite match
the ratio of the Forth "idea", which can fit a whole 
OS/Compiler/Linker/Editor in about 1K.

I'm sure Pointrel in its current initial state will suffer from the same
tradeoffs you get when you use a 1K Forth, compared to a 1MB Forth with
lots of bells and whistles. Pointrel will be far more useful when it has
lots of Python scripts for importing and exporting data, for searching
the data, and for doing other interesting things (versioning, supporting
a reliable file system, etc.). But the fact that you can go anywhere
with a 1K Forth says there is an elegant idea there. I hope that someday
one might say that about the Pointrel Data Repository System vs. a
typical RDBMS. 

It may always be possible to say a RDBMS does things you want to do
better than the Pointrel Data Repository System, just like I could say
Python does many things I want to do more easily than Forth. If I was
fitting an interpreter into 8K, I'd probably pick Forth over Python. If
I was fitting an intepreter into 1MB, I'd probably pick Python. So, my
hope is to shape the Pointrel system so that it fits some niche very
well. Right now I think that niche is as a flexible data repository.
Maybe some other niche would be better?

In any case, I should look a lot more at Gadfly. It might be possible to
merge the Pointrel system with the Gadfly SQL parser and related search
code, to make a version of Gadfly that didn't have to load tables in
memory. This version would have many of the performance tradeoffs
Pointrel has (flexible but much slower for routine things), but it might
be interesting...

> The idea somewhat reminds me of natrificial's TheBrain software. I guess
> arbitrary relationships is one of its strong points?

I looked at a few pages on the site http://www.thebrain.com but it
wasn't immediately obvious to me how the implemented what they have.
Their intent does sound similar: "TheBrain gives people the ability to
organize disparate pieces of information into one meaningful structure
that conveys valuable relationships in an easily understandable
display." 

Yes, handling arbitrary relationships is (hopefully) a strong point of
the Pointrel Data Repository System. I have implemented versions of the
Pointrel system focused on n-ary (tuple) relations (not triads), but I
feel the triads provide more flexibility, simplicity, and speed, while
not loosing that much of the power arbitrary length tuples provide
(since you can build more complex relations from triads).

There are many systems people have developed to store arbitrary
relationships.
Many have been done in the past by AI developers (typically with Lisp).

Things really clicked for me though when I was sitting in a tutorial
about Lisp (around 1982) and I realized "Aha! Lisp culturally is about
lists and structures; what I want to do is more about pointers and
relationships (hence, "Pointrel".)

> No disrespect meant, I'm just trying to understand more about Pointrel.

None taken. Thanks again for the great questions.

-Paul Fernhout
Kurtz-Fernhout Software 
=========================================================
Developers of custom software and educational simulations
Creators of the Garden with Insight(TM) garden simulator
http://www.kurtz-fernhout.com



More information about the Python-list mailing list