[Python-checkins] python/dist/src/Objects stringobject.c, 2.227,
2.228
rhettinger at users.sourceforge.net
rhettinger at users.sourceforge.net
Sun Feb 20 05:07:41 CET 2005
Update of /cvsroot/python/python/dist/src/Objects
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv21408/Objects
Modified Files:
stringobject.c
Log Message:
* Beef-up testing of str.__contains__() and str.find().
* Speed-up "x in y" where x has more than one character.
The existing code made excessive calls to the expensive memcmp() function.
The new code uses memchr() to rapidly find a start point for memcmp().
In addition to knowing that the first character is a match, the new code
also checks that the last character is a match. This significantly reduces
the incidence of false starts (saving memcmp() calls and making quadratic
behavior less likely).
Improves the timings on:
python -m timeit -r7 -s"x='a'*1000" "'ab' in x"
python -m timeit -r7 -s"x='a'*1000" "'bc' in x"
Once this code has proven itself, then string_find_internal() should refer
to it rather than running its own version. Also, something similar may
apply to unicode objects.
Index: stringobject.c
===================================================================
RCS file: /cvsroot/python/python/dist/src/Objects/stringobject.c,v
retrieving revision 2.227
retrieving revision 2.228
diff -u -d -r2.227 -r2.228
--- stringobject.c 31 Jan 2005 17:09:25 -0000 2.227
+++ stringobject.c 20 Feb 2005 04:07:04 -0000 2.228
@@ -1002,8 +1002,12 @@
static int
string_contains(PyObject *a, PyObject *el)
{
- const char *lhs, *rhs, *end;
- int size;
+ char *s = PyString_AS_STRING(a);
+ const char *sub = PyString_AS_STRING(el);
+ char *last;
+ int len_sub = PyString_GET_SIZE(el);
+ int shortsub;
+ char firstchar, lastchar;
if (!PyString_CheckExact(el)) {
#ifdef Py_USING_UNICODE
@@ -1016,20 +1020,29 @@
return -1;
}
}
- size = PyString_GET_SIZE(el);
- rhs = PyString_AS_STRING(el);
- lhs = PyString_AS_STRING(a);
-
- /* optimize for a single character */
- if (size == 1)
- return memchr(lhs, *rhs, PyString_GET_SIZE(a)) != NULL;
- end = lhs + (PyString_GET_SIZE(a) - size);
- while (lhs <= end) {
- if (memcmp(lhs++, rhs, size) == 0)
+ if (len_sub == 0)
+ return 1;
+ /* last points to one char beyond the start of the rightmost
+ substring. When s<last, there is still room for a possible match
+ and s[0] through s[len_sub-1] will be in bounds.
+ shortsub is len_sub minus the last character which is checked
+ separately just before the memcmp(). That check helps prevent
+ false starts and saves the setup time for memcmp().
+ */
+ firstchar = sub[0];
+ shortsub = len_sub - 1;
+ lastchar = sub[shortsub];
+ last = s + PyString_GET_SIZE(a) - len_sub + 1;
+ while (s < last) {
+ s = memchr(s, firstchar, last-s);
+ if (s == NULL)
+ return 0;
+ assert(s < last);
+ if (s[shortsub] == lastchar && memcmp(s, sub, shortsub) == 0)
return 1;
+ s++;
}
-
return 0;
}
More information about the Python-checkins
mailing list