When Python outruns C++ ...
Julian Tibble
chasm at rift.sytes.net
Tue Apr 1 08:58:27 EST 2003
In article <mailman.1049181423.2644.python-list at python.org>, Jp Calderone wrote:
> It looks like this could easily be improved with the application of a
> slightly smarter searching algorithm, such as Rabin-Karp search or
> Boyer-Moore find.
>
> Both of these use a technique of doing a little extra work up front to
> make each pass through the loop much less expensive. Google for details.
I can't see why. Maybe for more complex patterns, but not for just finding
words.
All you have to do to match words is go through the file character by
character with a simple finite state machine. The C snippet I used in
my comparison program was as follows:
/* C extract to count occurences of words
* ht = hash table handle
* t is one past the last character (set to '\0' in case there is a
* word right at the end)
*/
for (i = 0; i <= t; i++) {
if (isalpha(buf[i])) {
if (!in_word) {
in_word = 1;
start = &buf[i];
}
buf[i] = tolower(buf[i]);
} else {
if (in_word) {
buf[i] = '\0';
in_word = 0;
hash_countword(ht, start);
}
}
}
This is clearly O(n)
Rabin-Karp pattern search algorithm is an alternative to trying to match
a pattern at every position, which I'm not doing.
Julian
More information about the Python-list
mailing list