Efficient way of testing for substring being one of a set?
Jeff
jeffober at gmail.com
Thu Apr 3 08:59:33 EDT 2008
On Apr 3, 8:19 am, George Sakkis <george.sak... at gmail.com> wrote:
> On Apr 3, 8:03 am, Jeff <jeffo... at gmail.com> wrote:
>
> > def foo(sample, strings):
> > for s in strings:
> > if sample in s:
> > return True
> > return False
>
> > This was an order of magnitude faster for me than using str.find or
> > str.index. That was finding rare words in the entire word-list (w/
> > duplicates) of War and Peace.
>
> If you test against the same substrings over and over again, an
> alternative would be to build a regular expression:
>
> import re
> search = re.compile('|'.join(re.escape(x)
> for x in substrings)).search
> p = search(somestring)
> if p is not None:
> print 'Found', p.group()
>
> George
That would be an enormous regular expression and eat a lot of memory.
But over an enormous number of substrings, it would be O(log n),
rather than O(n).
More information about the Python-list
mailing list