Accessing a parse tree

Fri Apr 17 09:22:12 EDT 2009

On 17/04/2009 7:32 PM, Clarendon wrote:
> Dear John Machin

I presume that you replied to me instead of the list accidentally.

> 
> So sorry about the typo. It should be: "the program should *see* that
> the designated *words* are..."
> 
> "a long way" has two parentheses to the left -- (VP (DT -- before it
> hits a separate group -- VBD came).

Like I said, the parentheses are an artifact of one particular visual 
representation of the parse tree. Your effort at clarification has 
introduced new unexplained terminology ("separate group"). BTW if you 
plan to persist with parentheses, you might at least display the tree in 
a somewhat more consistent fashion, discovering in the process that you 
are two parentheses short:
(ROOT
     (S
         (NP
             (PRP I)
         )
         (VP
             (VBD came)
             (NP
                 (DT a)
                 (JJ long)
                 (NN way)
             )
             (PP
                 (IN in)
                 (S
                     (VP
                         (VBG changing)
                         (NP
                             (PRP$ my)
                             (NN habit)
                         )
                     )
                 )
             )
         )
Now look at this:
ROOT
     S
         NP
             PRP I
         VP
             VBD came
             NP
                 DT a
                 JJ long
                 NN way
             PP
                 IN in
                 etc etc
No parentheses, and no loss of information.

In fact if you keep the parentheses and lose all whitespace except a 
space between each node-type an a terminal word, you'll see that the 
parenthesis notation is just one way of serialising the tree.

You have a tree structure, with the parsed information built on top of 
the words (terminals). A very quick flip through the NLTK tutorial gave 
me the impression that it would be highly unlikely not to have all you 
need -- and a bazillion other things, which is probably why you can't 
find what you want :-) I certainly saw having parents mentioned as an option

Suggestions:
1. Get a pencil and a piece of paper, write "ROOT" at the top in the 
centre, and write "I came a long way in ......" spaced across the 
bottom. Fill in the parse tree.
2. Express your requirement in terms of moving around the tree, 
following pointers to parent, left/elder sibling (if any), right/younger 
sibling (if any), and children. E.g. the 3 parse nodes for "a long way" 
are "DT JJ NN" and their parent is "NP". NP's left sibling is a VBD node 
("came") and its right sibling is a PP ("in .....")
3. Then have another look at the NLTK docs
4. Ask questions on the NLTK mailing list.

HTH,
John