[issue41972] bytes.find consistently hangs in a particular scenario

Tim Peters report at bugs.python.org
Sat Oct 17 22:09:49 EDT 2020


Tim Peters <tim at python.org> added the comment:

Yup, they act essentially the same, but yours jumps into the quicksand earlier ;-)

I'm fond of this one:

"""
HUGE = 10**7
BIG = 10**6

bigxs = 'x' * BIG

haystack = 'x' * HUGE
needle = bigxs + 'y' + bigxs
"""

"The problem" then is in middle of the needle, not at either end, so really simple tricks all fail. For example, set(haystack) is a subset of set(needle), so our "Bloom filter" is useless, and needle[-1] == needle[-2], so our "skip" count is the useless 0.  Pure brute force is quadratic-time, and we're even slower because we're paying over & over to try heuristic speedups that happen never to pay off.

Two-way splits the needle as

u = bigxs
v = 'y' + bigxs

and again never gets out of the "try to match the right part" phase. Each time around it fails to match 'y' on the first try, and shifts right by 1.  Boring, but linear time overall.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue41972>
_______________________________________


More information about the Python-bugs-list mailing list