dynamic naming for hierarchical data problem

Alex Martelli aleax at aleax.it
Thu Aug 9 05:13:32 EDT 2001


"Jeremy Jones" <dnjjones at yahoo.com> wrote in message
news:eric7.138732$um3.816234 at typhoon.jacksonville.mediaone.net...
    ...
> I mean that there will be parent-child relationships
> and that the parents and children should be aware of
> one another in their respective relationships (i.e.
> the parent should be aware of its children and
> children should be aware of their parents).  By

This implies loops-of-references -- which may be
all right, but may interfere with automatic garbage
collection and some other operations.  Just be
aware of this -- the weak-references module is
there to help you with this specific issue in
some future refactoring of your code, should you
need to eliminate the loops-of-references.

> dynamic, I mean that the names in the set of data will
> very likely be extracted from a text file or entered
> in by a user at run time.  I also mean that the number
> of levels in the hierarchy will potentially be
> variable and will be defined at run time.  I would

Fine, no problem.

> prefer this set of data to be contained in one object.

I don't get this.  You've described the general nature
of the topology of a graph of objects -- child ones,
parent ones, relationships between them, etc.  What's
this now about "contained in one object"?  You can have
a general 'this is my whole object-network' object
whose attributes let you access each and every one
of the actual objects (children, parents, &c), if
that is what you mean -- but the actual objects will
of course be *several* objects, not ONE.

>  Right now, I have the following, which is a three
> tier set of dictionaries that are associated with each
> other:
>
> ################################
> cs1_cmd1 = {'OPEN': 'RESULTS'}
> cs1_cmd2 = {'CLOSE': 'RESULTS'}
> cs2_cmd1 = {'OPEN': 'RESULTS'}
> cs2_cmd2 = {'CLOSE': 'RESULTS'}
> cs2_cmd3 = {'CLOSE2': 'RESULTS'}
> cs3_cmd1 = {'OPEN': 'RESULTS'}
> cs3_cmd2 = {'CLOSE': 'RESULTS'}
>
> command_set1 = {'cmd1': cs1_cmd1, 'cmd2': cs1_cmd2}
> command_set2 = {'cmd1': cs2_cmd1, 'cmd2': cs2_cmd2,
> 'cmd3': cs2_cmd3}
> command_set3 = {'cmd1': cs3_cmd1, 'cmd2': cs3_cmd2}
>
> test_set = {'TEST1': command_set1,'TEST2':
> command_set2,'TEST3': command_set3}
> #################################
>
> which I can manipulate with much weeping gnashing of

So the test_set is the 'whole network object', the
command_set* objects are the 'parents', and the
csX_cmdY objects are the 'children', right?

> teeth.  A couple of problems with this (other than the
> fact that it is just plain hideous to look at):
>
> 1) A child doesn't really know who its parent is.

Right.  So, let's tell the child who its parent is:

class Node:
    def __init__(self, parent=None, **kwds):
        self.parent = parent
        self.dict = kwds

Now, we can set a child's parent directly, with
child.parent = someParent.  At some point you may
want to have a method called when the parent is
set or changed, but let's keep things as simple
as possible -- to start with, direct access to
attributes is simplest.

So, with this tiny auxiliary class, your example
becomes (let's call your example "step 0", and
the following one "step 1"):

cs1_cmd1 = Node(OPEN='RESULTS')
cs1_cmd2 = Node(CLOSE='RESULTS')
cs2_cmd1 = Node(OPEN='RESULTS')
cs2_cmd2 = Node(CLOSE='RESULTS')
cs2_cmd3 = Node(CLOSE2='RESULTS')
cs3_cmd1 = Node(OPEN='RESULTS')
cs3_cmd2 = Node(CLOSE='RESULTS')
command_set1 = Node(cmd1=cs1_cmd1, cmd2=cs1_cmd2)
cs1_cmd1.parent = cs1_cmd2.parent = command_set1
command_set2 = Node(cmd1=cs2_cmd1, cmd2=cs2_cmd2,
        cmd3=cs2_cmd3)
cs2_cmd1.parent = cs2_cmd2.parent = \
    cs2_cmd3.parent = command_set2
command_set3 = Node(cmd1=cs3_cmd1, cmd2=cs3_cmd2)
cs3_cmd1.parent = cs3_cmd2.parent = command_set3
test_set = Node(TEST1=command_set1, TEST2=command_set2,
        TEST33=command_set3)
command_set1.parent = command_set2.parent = \
    commmand_set3.parent = test_set

Now, that's better, but clearly we can see it's
very fussy and error-prone to have to set the
parent attribute of every child manually after
we've created the parent.  Couldn't we automate
this?  Of course we could, giving us "step 2":

class Node:
    def __init__(self, parent=None, **kwds):
        self.parent = parent
        self.dict = kwds
        for child in kwds.values():
            try: child.parent = self
            except (TypeError,AttributeError): pass

Since the children are the values in the kwds dictionary,
we just loop over them, trying to set the parent
attribute appropriately for each -- we do the latter
in a try/except, since (as the cs_cmd1 &c show) the
children MIGHT not be of types for which we can set
the parent attribute, and that's OK.  With this new
and improved class Node, step 2 then becomes:

cs1_cmd1 = Node(OPEN='RESULTS')
    # like in step 1, 6 more lines like this snipped
command_set1 = Node(cmd1=cs1_cmd1, cmd2=cs1_cmd2)
command_set2 = Node(cmd1=cs2_cmd1, cmd2=cs2_cmd2,
        cmd3=cs2_cmd3)
command_set3 = Node(cmd1=cs3_cmd1, cmd2=cs3_cmd2)
test_set = Node(TEST1=command_set1, TEST2=command_set2,
        TEST33=command_set3)

Much better, I think.  Now, let's move to your
second problem:

> 2) The data is not dynamic: it has to be built from
> within the script at design time and not run time.

So let's do it at runtime.  Presumably the code
can read a text file giving the data about the
children and parents and interpret it.  If you
think about it, the text file will be a little
language, and it's a hard job designing a better
language than Python, so, why not use Python for
the purposes?  You can just execfile(...) a user
script that adds as many Node objects as you
want, maybe.  But if for some reason you have
other formats of text files, it may not be all
that hard to interpret them.  Suppose for example
that what you have is a text full of lines of
either of the two forms:

DATA name1 name2 data
CHILD name1 name2 name3

where the first form means: add string 'data'
(more or less arbitrary) as entry named as per
name2 in Node named as per name1, creating the
Node if need be; and the second form means,
add the Node named name3 as child named as per
name2 in Node named as per name1, creating Nodes
if need be.
To interpret such a 'little program' and
create the corresponding data structure,
we need a name-to-Nodeobject dictionary --
we could use variables, but that's by far
too much trouble -- so let's use an explicit
dictionary instead, wrapped in a class for
convenience.

Needed infrastructure for this, e.g:
class Nodenames:
    def __init__(self):
        self.namedict = {}
    def get(self, name):
        return self.namedict.setdefault(
            name, Node())

and the process of adding a file of the
above little-language given a Nodenames
instance becomes:

def addafile(thefile, nodenames):
    for line in thefile.readlines():
        line = line.strip()   # remove whitespace
        # let's accept empty lines & comments too
        if line='' or line.startswith('#'): continue
        try:
            cmd, name1, name2, rest = line.split(' ',3)
        except ValueError:
            print "Too few fields in line (%s), skipped"%line
            continue
        if cmd=="DATA":
            thenode = nodenames.get(name1)
            thenode.dict[name2] = rest
        elif cmd=="CHILD":
            parent = nodenames.get(name1)
            child = nodenames.get(name3)
            parent.dict[name2] = child
            child.parent = parent
        else:
            print "Invalid cmd (%s) in line (%s), skipped"%(
                cmd, line)

There -- not too bad, right?  Now, this IS untested
code (and one should always write the test code
first, THEN the code that makes the test pass...!),
so it might not work, but I hope you get the
general idea anyway.

> 3) On the dynamic topic, I can't just add a child.  I
> have to create kind of a reference in the parent to
> another dictionary.

See above -- we do add a child via the reference in
the parent &c.  No problem.

> 4) It is scattered across 11 dictionaries (even though
> it has one base dictionary that acts as the root
> parent).

And this is a problem BECAUSE...?  In other words,
you've described a natural consequence of how a
dynamically built network of nodes *WANTS* to be --
"scattered" across several node objects (each of
which uses one or more dictionaries as the most
normal Pythonic way to implement itself, of course).


> I want to do the above more elegantly, more concisely
> and less painfully.  I have read the python FAQ,
> searched usenet, searched Google, scoured
> www.python.org and have only come up with a few
> possibilities for doing what I want to do (such as
> metaclasses and acquisition and some virtual file
> systems), but I am not sure that they will address the
> solution for the problem that I have.

Keep It Simple, Sil^H^H^H Jeremy: metaclasses and
acquisition are BY FAR overkill for your needs.


> Does anyone have any ideas on how I can begin to
> attack my problem?  Does anyone have any ideas on
> specific methodologies for dynamically named
> hierarchical data?

There's a Design Pattern that handles this, and
it's implemented in the Node class above -- use
the same class for both parents and children
since parents can also be children (to put it in
the simplest of terms:-).  You could use XML,
of course (it's all about named hierarchical
data), but, again -- avoid overkill...


Alex






More information about the Python-list mailing list