[Tutor] Printing regular expression match

Srinivas Iyyer srini_iyyer_bio at yahoo.com
Sun Dec 4 02:28:23 CET 2005


Hi Danny, 
thanks for your email. 

In the example I've shown, there are no odd elements
except for character case. 

In the real case I have a list of 100 gene names for
Humans. 
The human gene names are conventioanlly represented in
higher cases (eg.DDX3X).  However, NCBI's gene_info
dataset the gene names are reported in lowercase (eg.
ddx3x).  I want to extract the rest of the information
for DDX3X that I have from NCBI's file (given that
dataset is in tab delim format). 

my approach was if i can define DDX3X is identical
ddx3x then I want to print that line from the other
list (NCBI's gene_info dataset). 

I guess, I understood your suggestion wrongly.  In
such case, why do I have to drop something from list b
(which is over 150 K lines). If I can create a sublist
of all elements in b (a small list of 100) then it is
more easy. this is my opinion. 

-srini


--- Danny Yoo <dyoo at hkn.eecs.berkeley.edu> wrote:

> 
> 
> On Sat, 3 Dec 2005, Srinivas Iyyer wrote:
> > >>> a
> > ['apple', 'boy', 'boy', 'apple']
> >
> > >>> b
> > ['Apple', 'BOY', 'APPLE-231']
> >
> > >>> for i in a:
> > 	pat = re.compile(i,re.IGNORECASE)
> > 	for m in b:
> > 		if pat.match(m):
> > 			print m
> 
> 
> Hi Srinivas,
> 
> We may want to change the problem so that it's less
> focused on "print"ing
> results directly.  We can rephrase the question as a
> list "filtering"
> operation: we want to keep the elements of b that
> satisfy a certain
> criteron.
> 
> 
> Let's give a name to that criterion now:
> 
> ######
> def doesNameMatchSomePrefix(word, prefixes):
>     """Returns True if the input word is matched by
> some prefix in
>     the input list of prefixes.  Otherwise, returns
> False."""
>     # ... fill me in
> 
> ######
> 
> 
> Can you write doesNameMatchSomePrefix()?  In fact,
> you might not even need
> regexes to write an initial version of it.
> 
> 
> 
> If you can write that function, then what you're
> asking:
> 
> > I do not want python to print both elenents from
> lists a and b.  I just
> > want only the elements in the list B.
> 
> should not be so difficult: it'll be a
> straightforward loop across b,
> using that helper function.
> 
> 
> 
> (Optimization can be done to make
> doesNameMatchSomePrefix() fast, but you
> probably should concentrate on correctness first. 
> If you're interested in
> doing something like this for a large number of
> prefixes, you might be
> interested in:
> 
>    
>
http://hkn.eecs.berkeley.edu/~dyoo/python/ahocorasick/
> 
> which has more details and references to specialized
> modules that attack
> the problem you've shown us so far.)
> 
> 
> Good luck!
> 
> 



		
__________________________________________ 
Yahoo! DSL – Something to write home about. 
Just $16.99/mo. or less. 
dsl.yahoo.com 



More information about the Tutor mailing list