[Tutor] mxTextTools help: find the deepest in nest lists
pan at uchicago.edu
pan at uchicago.edu
Wed Dec 10 16:41:27 EST 2003
Hi Danny,
I did try the re approach earlier, but I got an "maximum recursion"
error when the list gets complicated. I guess it is not good for
large nested list.
So I turned to mxTextTools, which I failed to finish learning it
MANY MANY times before. But this time I decided to twist my brain
a lot further to see if I can finally learn how it works.
I am so interested because first of all it was said to be very fast,
which might be critical if I wanna make some programs to work in
gene/genome analysis; and secondly, it uses the state machine concept,
which is quite alian for a people like me with major in biology. I can
sense it is powerful in the future application on our field, but i need
to learn before I can decide how right (or wrong) I am.
The tagtable concept is quite different from re. Basically it is a
collection of 'states'. When parsing, the core engine reads through
the target text, check if the read text match a state. Defined in
each state is a decision tree, which directs which state will become
the next state when matched and which when failed.
The mxTextTools itself is quite a briliant work, but the
documentation --- IMHO ---- doesn't mean to let someone who doesn't
already know about the concept understand. So it is almost impossible
to learn, especially for a beginner like me.
And, so far, I can't find any tutorial detailed enough. Provided in
the chapter 4 of David Mertz's 'Text processing in Python' is a much
better documentation but a brief glance seems to get me an impression that
the example given is too complicated for a beginner. But, this might change
after I have time to go over the entire mxTextTools part back and forth
several times.
Anyway, thx for the help. If in the end I finally "twist my head"
successfully and learn how it works, I might write a tutorial on
it.
pan
Quoting Danny Yoo <dyoo at hkn.eecs.berkeley.edu>:
>
>
> On Tue, 9 Dec 2003 pan at uchicago.edu wrote:
>
> > I am learning how to use mxTextTools. I need help with
> > the following ...
> >
> > I wanna match the deepest lists in a nested list tree.
> > For example:
> >
> > The [2, 2a] and [3, 3b] in :
> >
> > [[[1,[2, 2a]],[[3,3b],4]],6]
> >
> > The [a.1,b_2] and [d.3,e] in :
> >
> > [[a.1,b_2],[[c,[d.3,e]],f]]
> >
> > The [a1,a2], [d.1, d.2] and [o3,o4] in :
> >
> > [[[[[a1,a2],out],[d.1, d.2]],o2],[[o3,o4],o5]]
> >
> > How can I setup a tagtable for matching them ??
>
>
> Hi Pan,
>
> I'm not familiar enough with mxTextTools's tag table stuff. Just to make
> sure though: are you sure that a text-handling approach is best for this
> problem? Is your input a originally a data structure, or a string?
>
>
> One way to approach this with regular expressions might be something like
> this:
>
> ###
> >>> import re
> >>> def find_innermost_nodes(tree_string):
> ... regex = re.compile(r"""\[ ## literal open brace
> ... [^\[]+ ## one or more non-brace
> ... ## characters
> ... \] ## literal close brace
> ... """, re.VERBOSE)
> ... return regex.findall(tree_string)
> ...
> >>> find_innermost_nodes("[[[1,[2, 2a]],[[3,3b],4]],6]")
> ['[2, 2a]]', '[3,3b],4]],6]']
> ###
>
>
>
> Ooops! Forgot to make the search nongreedy:
>
> ###
> >>> def find_innermost_nodes(tree_string):
> ... regex = re.compile(r"""\[ ## literal open brace
> ... [^\[]+? ## one or more non-brace
> ... ## characters, nongreedily
> ... \] ## literal close brace
> ... """, re.VERBOSE)
> ... return regex.findall(tree_string)
> ...
> >>> find_innermost_nodes("[[[1,[2, 2a]],[[3,3b],4]],6]")
> ['[2, 2a]', '[3,3b]']
> ###
>
>
> That's better. *grin* Aren't the tag tables similar to regular
> expressions?
>
>
>
>
> That being said, I think this approach is approaching the problem in a way
> that doesn't generalize well. If we had some tree structure, I'd approach
> this with something like:
>
> ###
> def is_innermost_node(node):
> if is_leaf(node): return false
> return is_leaf(left(node)) and is_leaf(right(node))
>
> def find_innermost_nodes(node):
> if is_leaf(node):
> return []
> if is_innermost_node(node):
> return [node]
> return (find_innermost_nodes(left(node)) +
> find_innermost_nodes(right(node)))
> ###
>
> assuming that we have some functions for seeing if a node is a leaf
> (is_leaf()), and well as ways of getting the left() and right() children
> of a tree node.
>
>
> Good luck to you!
>
More information about the Tutor
mailing list