[issue41972] bytes.find consistently hangs in a particular scenario

Tim Peters report at bugs.python.org
Fri Oct 16 17:08:45 EDT 2020


Tim Peters <tim at python.org> added the comment:

I don't think Rabin-Karp is worth trying here. The PR is provably worst-case linear time, and the constant factor is already reasonably small. Its only real weakness I can see is that it can be significantly (but seemingly not dramatically) slower than the status quo, in what are exceptionally good cases for the status quo, and perhaps most often due to the increased preprocessing cost of the new code (which is relatively most damaging if the needle is found early in the haystack - in which cases preprocessing can be close to a pure waste of time).

The status quo and the PR both enjoy sublinear (O(len(haystack) / len(needle)) cases too, but R-K doesn't.

Where R-K is a "go to" choice is when an app needs to search the same large text for several strings (Wikipedia has a decent description of R-K can be adapted to that) - but Python has no string API that does such a thing (closest is joining the search-for strings via "|" for a regexp search).

----------

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue41972>
_______________________________________


More information about the Python-bugs-list mailing list