[Tutor] Printing regular expression match
Srinivas Iyyer
srini_iyyer_bio at yahoo.com
Sun Dec 4 02:28:23 CET 2005
Hi Danny,
thanks for your email.
In the example I've shown, there are no odd elements
except for character case.
In the real case I have a list of 100 gene names for
Humans.
The human gene names are conventioanlly represented in
higher cases (eg.DDX3X). However, NCBI's gene_info
dataset the gene names are reported in lowercase (eg.
ddx3x). I want to extract the rest of the information
for DDX3X that I have from NCBI's file (given that
dataset is in tab delim format).
my approach was if i can define DDX3X is identical
ddx3x then I want to print that line from the other
list (NCBI's gene_info dataset).
I guess, I understood your suggestion wrongly. In
such case, why do I have to drop something from list b
(which is over 150 K lines). If I can create a sublist
of all elements in b (a small list of 100) then it is
more easy. this is my opinion.
-srini
--- Danny Yoo <dyoo at hkn.eecs.berkeley.edu> wrote:
>
>
> On Sat, 3 Dec 2005, Srinivas Iyyer wrote:
> > >>> a
> > ['apple', 'boy', 'boy', 'apple']
> >
> > >>> b
> > ['Apple', 'BOY', 'APPLE-231']
> >
> > >>> for i in a:
> > pat = re.compile(i,re.IGNORECASE)
> > for m in b:
> > if pat.match(m):
> > print m
>
>
> Hi Srinivas,
>
> We may want to change the problem so that it's less
> focused on "print"ing
> results directly. We can rephrase the question as a
> list "filtering"
> operation: we want to keep the elements of b that
> satisfy a certain
> criteron.
>
>
> Let's give a name to that criterion now:
>
> ######
> def doesNameMatchSomePrefix(word, prefixes):
> """Returns True if the input word is matched by
> some prefix in
> the input list of prefixes. Otherwise, returns
> False."""
> # ... fill me in
>
> ######
>
>
> Can you write doesNameMatchSomePrefix()? In fact,
> you might not even need
> regexes to write an initial version of it.
>
>
>
> If you can write that function, then what you're
> asking:
>
> > I do not want python to print both elenents from
> lists a and b. I just
> > want only the elements in the list B.
>
> should not be so difficult: it'll be a
> straightforward loop across b,
> using that helper function.
>
>
>
> (Optimization can be done to make
> doesNameMatchSomePrefix() fast, but you
> probably should concentrate on correctness first.
> If you're interested in
> doing something like this for a large number of
> prefixes, you might be
> interested in:
>
>
>
http://hkn.eecs.berkeley.edu/~dyoo/python/ahocorasick/
>
> which has more details and references to specialized
> modules that attack
> the problem you've shown us so far.)
>
>
> Good luck!
>
>
__________________________________________
Yahoo! DSL Something to write home about.
Just $16.99/mo. or less.
dsl.yahoo.com
More information about the Tutor
mailing list