counting number of (overlapping) occurances

Ben Cartwright bencvt at gmail.com
Fri Mar 10 01:25:06 EST 2006


John wrote:
> This works but is a bit slow, I guess I'll have to live with it.
> Any chance this could be sped up in python?

Sure, to a point.  Instead of:

  def countoverlap(s1, s2):
      return len([1 for i in xrange(len(s1)) if s1[i:].startswith(s2)])

Try this version, which takes smaller slices (resulting in 2x-5x speed
increase when dealing with a large s1 and a small s2):

  def countoverlap(s1, s2):
      L = len(s2)
      return len([1 for i in xrange(len(s1)-L+1) if s1[i:i+L] == s2])

And for a minor extra boost, this version eliminates the list
comprehension:

  def countoverlap(s1, s2):
      L = len(s2)
      cnt = 0
      for i in xrange(len(s1)-L+1):
          if s1[i:i+L] == s2:
              cnt += 1
      return cnt

Finally, if the execution speed of this function is vital to your
application, create a C extension.  String functions like this one are
generally excellent candidates for extensionizing.

--Ben




More information about the Python-list mailing list