Accessing a parse tree

Fri Apr 17 07:22:57 EDT 2009

On Apr 17, 4:03 am, Clarendon <jine... at hotmail.com> wrote:
> Thank you very much for this information. It seems to point me to the
> right direction. However, I do not fully understand the flatten
> function and its output. Some indices seem to be inaccurate. I tried
> to find this function at nltk.tree.Tree.flatten, but it returns a
> flattened tree, not a tuple.
>
> So your flatten function must be a different one, and it's not one of
> the builtins, either. Could you tell me where I can find the
> documentation about this flatten function?

No, it is a different one.  I don't even have it.  We'd have to write
it.

The indices weren't included in the flattened tree, but if you're
writing it, it can.

0: ( 'ROOT', None, <object>, None --no parent--, 0 )
1: ( 'S', None, <object>, 0 --parent is 'ROOT'--, 1 )
2: ( 'NP', None, <object>, 1 --parent is 'S'--, 2 )
3: ( 'PRP', 'I', <object>, 2 --parent is 'NP'--, 3 )
4: ( 'VP', None, <object>, 1 --parent is 'S', 2 )
5: ( 'VBD', 'came', <object>, 4 --parent is 'VP'--, 2 )

I screwed up the 'depth' field on #5.  It should be:
5: ( 'VBD', 'came', <object>, 4 --parent is 'VP'--, **3** )

Otherwise I'm not sure what you mean by 'indices seem to be
inaccurate'.  I'm still not completely sure though.  After all, I did
it by hand, not by program.

If your package comes with a flatten function, it would be a good
place to start.  Flatten functions can get hairy.  What is its code,
and what is its output?

Here's an example:

>>> a= [ 'p', [ [ 'q', 'r' ], 's', 't' ], 'u' ]
>>> a
['p', [['q', 'r'], 's', 't'], 'u']
>>> def flatten( x ):
...     for y in x:
...             if isinstance( y, list ):
...                     for z in flatten( y ):
...                             yield z
...             else:
...                     yield y
...
>>> list( flatten( a ) )
['p', 'q', 'r', 's', 't', 'u']