[issue41972] bytes.find consistently hangs in a particular scenario
Dennis Sweeney
report at bugs.python.org
Sun Oct 18 21:24:36 EDT 2020
Dennis Sweeney <sweeney.dennis650 at gmail.com> added the comment:
Below is one of the tests that got run when I happened to import something, and I thought it was a good illustration of the Boyer-Moore bad character shift table.
It's worth noting in particular that the table is the dominant force for speed in some common cases, with the two-way stuff only ever being checked once in this example. The shift table can be defeated with pathological strings, and that's where the two-way stuff begins to shine.
Checking " 32 bit (ARM)" in "3.10.0a1+ (heads/two-way-dirty:cf4e398e94, Oct 18 2020, 20:09:21) [MSC v.1927 32 bit (Intel)]".
========================
Two-way with needle=" 32 bit (ARM)" and haystack="(heads/two-way-dirty:cf4e398e94, Oct 18 2020, 20:09:21) [MSC v.1927 32 bit (Intel)]"
Split " 32 bit (ARM)" into " 32 bit" and " (ARM)".
needle is NOT completely periodic.
Using period 8.
> "(heads/two-way-dirty:cf4e398e94, Oct 18 2020, 20:09:21) [MSC v.1927 32 bit (Intel)]"
> " 32 bit (ARM)"
Last character not found in string.
> "(heads/two-way-dirty:cf4e398e94, Oct 18 2020, 20:09:21) [MSC v.1927 32 bit (Intel)]"
> " 32 bit (ARM)"
Table says shift by 11.
> "(heads/two-way-dirty:cf4e398e94, Oct 18 2020, 20:09:21) [MSC v.1927 32 bit (Intel)]"
> " 32 bit (ARM)" # Made the '3's line up
Table says shift by 5.
> "(heads/two-way-dirty:cf4e398e94, Oct 18 2020, 20:09:21) [MSC v.1927 32 bit (Intel)]"
> " 32 bit (ARM)" # Here made the spaces line up
Last character not found in string.
> "(heads/two-way-dirty:cf4e398e94, Oct 18 2020, 20:09:21) [MSC v.1927 32 bit (Intel)]"
> " 32 bit (ARM)" # Made the spaces line up
Checking the right half.
No match.
Jump forward without checking left half.
> "(heads/two-way-dirty:cf4e398e94, Oct 18 2020, 20:09:21) [MSC v.1927 32 bit (Intel)]"
> " 32 bit (ARM)" # Made the spaces line up
Table says shift by 5.
> "(heads/two-way-dirty:cf4e398e94, Oct 18 2020, 20:09:21) [MSC v.1927 32 bit (Intel)]"
> " 32 bit (ARM)" # Made the spaces line up
Table says shift by 5.
> "(heads/two-way-dirty:cf4e398e94, Oct 18 2020, 20:09:21) [MSC v.1927 32 bit (Intel)]"
> " 32 bit (ARM)" # Made the spaces line up
Table says shift by 10.
> "(heads/two-way-dirty:cf4e398e94, Oct 18 2020, 20:09:21) [MSC v.1927 32 bit (Intel)]"
> " 32 bit (ARM)" # Made the spaces line up
Table says shift by 4.
> "(heads/two-way-dirty:cf4e398e94, Oct 18 2020, 20:09:21) [MSC v.1927 32 bit (Intel)]"
> " 32 bit (ARM)" # Made the lparens line up
Last character not found in string.
Reached end. Returning -1.
----------
_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue41972>
_______________________________________
More information about the Python-bugs-list
mailing list