str.count is slow

Mon Feb 27 19:22:52 EST 2006

"Ben Cartwright" <bencvt at gmail.com> wrote in message 
news:1141083127.970403.147100 at v46g2000cwv.googlegroups.com...
> Your evidence points to some unoptimized code in the underlying C
> implementation of Python.  As such, this should probably go to the
> python-dev list (http://mail.python.org/mailman/listinfo/python-dev).
>
> The problem is that the C library function memcmp is slow, and
> str.count calls it frequently.  See lines 2165+ in stringobject.c
> (inside function string_count):
>
> r = 0;
> while (i < m) {
> if (!memcmp(s+i, sub, n)) {
> r++;
> i += n;
> } else {
> i++;
> }
> }
>
> This could be optimized as:
>
> r = 0;
> while (i < m) {
> if (s[i] == *sub && !memcmp(s+i, sub, n)) {
> r++;
> i += n;
> } else {
> i++;
> }
> }
>
> This tactic typically avoids most (sometimes all) of the calls to
> memcmp.  Other string search functions, including unicode.count,
> unicode.index, and str.index, use this tactic, which is why you see
> unicode.count performing better than str.count.

If not doing the same in str.count is indeed an oversight.  a patch should 
be welcome (on the SF tracker).