[Tutor] Is this called a 'Hash table method for string mapping'

Danny Yoo dyoo at hkn.eecs.berkeley.edu
Mon Sep 4 19:37:28 CEST 2006


> So, what if one has to map a string of 15 character nucleotide to a 
> jumbo string of characters of length in millions.

Hi Srinivas,

If you're doing research into this, I strongly recommend you take a look 
at Dan Gusfield's excellent textbook "Algorithms on Strings, Trees, and 
Sequences":

     http://www.cambridge.org/uk/catalogue/catalogue.asp?isbn=0521585198

Reinvention is fun, but there's already been a lot of work poured into 
this field; the textbook above provides more background, so you can stand 
on previous efforts.

[code cut]


> Do you think is there any flaw in this.

It works, and I don't see anything immediately wrong with this.  The only 
issue is that it'll be only exact matching on the oligos that have been 
stored in that hashtable.


> Is there any better method that is possible.

It depends on what you're looking for.  Note that the approach you're 
using will only match properly if your oligo width matches the query 
string width.  This may or may not be a problem for you.


Gusfield's textbook above surveys several other methods of doing string 
matching that are particularly suitable for biologists.  If you're 
planning to do a lot of small queries on a large string like that, then a 
data structure such as a "suffix tree" is worth knowing.

     http://en.wikipedia.org/wiki/Suffix_tree


You may also want to look into specialized applications; arabidopsis.org 
hosts a program called 'patmatch' that may be helpful for you:

     http://nar.oxfordjournals.org/cgi/content/abstract/33/suppl_2/W262

(As a disclaimer: I did some work on patmatch.)


Good luck.


More information about the Tutor mailing list