[Tutor] mapping problem

Kent Johnson kent37 at tds.net
Wed Feb 1 23:48:46 CET 2006


Srinivas Iyyer wrote:
> Dear group, 
>   I have a problem in finding a method to solve a
> problem where I want to walk through a lineage of
> terms and find group them from right to left. 
> 
> A snippet of the problem is here. The terms in file as
> tab delim manner. 
> 
> a	b	c	d	car
> a	b	c	f	truck
> a	b	c	d	van
> a	b	c	d	SUV
> a	b	c	f	18-wheeler
> a	b 	j	k	boat
> a	b	j	a	submarine
> a	b	d	a	B-747
> a	b	j	c	cargo-ship
> a	b	j	p	passenger-cruise ship
> a	b	a	a	bicycle
> a	b	a	b	motorcycle
> 
> 
> Now my question is to enrich members that have
> identical lineage with different leaf.
> 'i.e': a b c d - van suv . I have two terms in this
> path and I am not happy with two. I wish to have more.
> 
> Then: a b c - car, van, truck, SUV and 18-wheeler
> (automobiles that travel on road). I am happy with
> this grouping and I enriched more items if I walk on
> lienage : (a-b-c)

I'm not sure I understand what you want to do, but I think a tree where 
each internal node is a dict and each leaf node is a list will do what 
you want. You would end up with something like
tree['a']['b']['c']['d'] = ['car', 'van', 'suv']

To find the value for a b c you would traverse the tree to that point, 
then accumulate all the leaf nodes underneath.

OK I guess I feel like writing code, here is a simple implementation. It 
requires that all the leaves be at the same depth. It's not particularly 
clever but it works and shows what can be done just by hooking together 
basic data structures.

raw_data = '''a   b   c   d   car
a   b   c   f   truck
a   b   c   d   van
a   b   c   d   SUV
a   b   c   f   18-wheeler
a   b   j   k   boat
a   b   j   a   submarine
a   b   d   a   B-747
a   b   j   c   cargo-ship
a   b   j   p   passenger-cruise ship
a   b   a   a   bicycle
a   b   a   b   motorcycle'''.splitlines()

tree = {}

# This builds the tree of nested dictionaries with lists as the leaves
for line in raw_data:
     data = line.split(None, 5)
     keys = data[:4]
     value = data[4]

     # This loop makes the dict nodes
     subtree = tree
     for key in keys[:-1]:
         subtree = subtree.setdefault(key, {})

     # The last key points to a list, not a dict
     lastkey = keys[-1]
     subtree.setdefault(lastkey, []).append(value)


def iter_leaves(subtree):
     ''' Recursive generator that yields all the leaf nodes of subtree '''
     if isinstance(subtree, list):
         # A leaf node
         yield subtree
         return

     for item in subtree.itervalues():
         for leaf in iter_leaves(item):
             yield leaf

def get_leaves(*keys):
     ''' Return a list of all the leaves in the subtree pointed to by 
*keys '''
     subitem = tree
     for key in keys:
         subitem = subitem[key]

     leaves = []
     for leaf in iter_leaves(subitem):
         leaves.extend(leaf)

     return leaves

print get_leaves('a', 'b', 'c', 'd')
print get_leaves('a', 'b', 'c')

## prints

['car', 'van', 'SUV']
['car', 'van', 'SUV', 'truck', '18-wheeler']

Kent

> 
> 
> Thus, I want to try to enrich for all 21 K lines of
> lineages.
> 
> My question:
> 
> Is there a way to automate this problem.
> 
> My idea of doing this:
> 
> Since this is a tab delim file. I want to read a line
> with say 5 columns (5 tabs).  Search for items with
> same column item 4 (because leaf items could be
> unique). If I find a hit, then check if columns 3 and
> 2 are identical if so create a list. 
> 
> Although this problem is more recursive and time and
> resource consuming, I cannot think of an easy
> solution. 
> 
> Would you please suggest a nice and simple method to
> solve this problem. 
> 
> For people who are into bioinformatics (I know Danny
> Yoo is a bioinformatician) the question is about GO
> terms.  I parsed OBO file and laid out the term
> lineages that constitute the OBO-DAG structure. I want
> to enrich the terms to do an enrichment analysis for a
> set of terms that I am interested in.
> 
> Thank you in advance.
> 
> cheers
> Srini
> 
> 
> 
> 
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around 
> http://mail.yahoo.com 
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor
> 
> 




More information about the Tutor mailing list