[issue35859] Capture behavior depends on the order of an alternation

Tue Mar 12 09:52:50 EDT 2019

Ma Lin <malincns at 163.com> added the comment:

> Could you please create and run some microbenchmarks to measure
> possible performance penalty of additional MARH_PUSHes? I am 
> especially interesting in worst cases.

Besides the worst case, I prepared two solutions.

Solution_A (PR12288):
Fix the bugs, I can find test-case for every changes.
I feel JUMP_MIN_UNTIL_3 should MARK_PUSH() as well, but I really can't find a test-case to prove this feel.
Commit_12 is a safety-check, if JUMP_MIN_UNTIL_3 or other JUMPs should be protected, maybe there will be a bug report come from millions of users.

Solution_B (PR12289):
Based on Solution_A, protects JUMP_MIN_UNTIL_3.

Worst (PR12290):
Based on Solution_B, protects in everywhere, this should the worst performance.

Legend of this table:
* No: no protection
* L : LASTMARK_SAVE()
* Mu: MARK_PUSH() unconditionally
* Mr: MARK_PUSH() if in a repeat

                  unpatched  Solution_A  Solution_B   Worst
JUMP_MAX_UNTIL_1     No         No          No    ->  L/Mu
JUMP_MAX_UNTIL_2     L/Mu       L/Mu        L/Mu      L/Mu
JUMP_MAX_UNTIL_3     No         No          No    ->  L/Mu

JUMP_MIN_UNTIL_1     No         No          No    ->  L/Mu
JUMP_MIN_UNTIL_2     L    ->    L/Mr        L/Mr  ->  L/Mu
JUMP_MIN_UNTIL_3     No         No     ->   L/Mu  ->  L/Mu

JUMP_REPEAT_ONE_1    L    ->    L/Mr        L/Mr  ->  L/Mu
JUMP_REPEAT_ONE_2    L    ->    L/Mr        L/Mr  ->  L/Mu

JUMP_MIN_REPEAT_ONE  L    ->    L/Mr        L/Mr  ->  L/Mu

JUMP_BRANCH          L/Mr       L/Mr        L/Mr  ->  L/Mu

JUMP_ASSERT          No         No          No    ->  L/Mu

JUMP_ASSERT_NOT      No   ->    L/Mr        L/Mr  ->  L/Mu

🔶 I made a benchmark tool for Python RE engines.
https://github.com/animalize/re_benchmarks

re100mb.py was extracted from a real case, process 100MB real data in 16 rounds (with 16 patterns).
This table picks best of 4 results from each round:

             unpatched  A        B        Worst    regex 2019.3.9
test 01 best:  0.647    0.629    0.625    0.635    0.102
test 02 best:  3.980    4.046    4.052    4.005    4.530
test 03 best:  0.730    0.708    0.709    0.706    0.433
test 04 best:  0.763    0.743    0.740    0.736    0.799
test 05 best:  2.566    2.693    2.692    2.654    0.981
test 06 best:  0.883    0.865    0.859    0.874    0.872
test 07 best:  2.847    2.773    2.797    2.890    1.202
test 08 best:  3.644    3.664    3.740    3.699    1.139
test 09 best:  0.344    0.348    0.343    0.345    0.378
test 10 best:  0.341    0.347    0.343    0.344    0.407
test 11 best:  4.490    4.655    4.520    4.440    1.340
test 12 best:  0.264    0.263    0.262    0.264    0.271
test 13 best:  0.230    0.230    0.231    0.233    0.281
test 14 best:  0.932    0.925    0.919    0.943    0.334
test 15 best:  1.837    1.815    1.804    1.866    0.683
test 16 best:  1.691    1.708    1.676    2.042    3.805
--------------------------------------------------------
sum of above: 26.189   26.412   26.312   26.676   17.557

There seems no significant changes.

🔶 Below are some micro benchmarks, run t.py with the benchmark tool testit.py

SRE_OP_MAX_UNTIL a few repeats
unpatched: 744.85 nsec
SolutionA: 755.86 nsec
SolutionB: 745.14 nsec
Worst:     843.56 nsec
regex:     2.45 usec

SRE_OP_MAX_UNTIL many repeats
unpatched: 393.24 msec
SolutionA: 321.16 msec
SolutionB: 323.24 msec
Worst:     498.48 msec
regex:     240.73 msec

------------------
SRE_OP_MIN_UNTIL a few repeats
unpatched: 702.75 nsec
SolutionA: 701.90 nsec
SolutionB: 730.81 nsec
Worst:     873.67 nsec
regex:     1.84 usec

SRE_OP_MIN_UNTIL many repeats
unpatched: 210.60 msec
SolutionA: 174.20 msec
SolutionB: 323.93 msec
Worst:     491.73 msec
regex:     195.11 msec

------------------
SRE_OP_REPEAT_ONE short string
unpatched: 466.56 nsec
SolutionA: 468.85 nsec
SolutionB: 466.04 nsec
Worst:     463.86 nsec
regex:     1.13 usec

SRE_OP_REPEAT_ONE long string
unpatched: 2.19 msec
SolutionA: 2.19 msec
SolutionB: 2.19 msec
Worst:     2.19 msec
regex:     3.23 msec

------------------
SRE_OP_MIN_REPEAT_ONE short string
unpatched: 563.76 nsec
SolutionA: 566.68 nsec
SolutionB: 561.92 nsec
Worst:     601.69 nsec
regex:     1.12 usec

SRE_OP_MIN_REPEAT_ONE long string
unpatched: 11.16 msec
SolutionA: 11.27 msec
SolutionB: 10.55 msec
Worst:     15.97 msec
regex:     7.24 msec

------------------
SRE_OP_BRANCH
unpatched: 419.34 nsec
SolutionA: 422.07 nsec
SolutionB: 418.25 nsec
Worst:     423.56 nsec
regex:     1.34 usec

------------------
SRE_OP_ASSERT
unpatched: 320.58 nsec
SolutionA: 326.29 nsec
SolutionB: 320.81 nsec
Worst:     333.22 nsec
regex:     1.14 usec

SRE_OP_ASSERT_NOT
unpatched: 440.18 nsec
SolutionA: 440.93 nsec
SolutionB: 434.44 nsec
Worst:     446.33 nsec
regex:     1.00 usec

🔶 reduce the size of match_context struct

In stack-consuming scenes, Solution_A and Solution_B are better than unpatched.
This is because commit 10 and commit 11, they reduced the size of match_context struct, the stack uses this struct to simulate recursive call.

On 32 bit platform, sizeof(match_context): 36 bytes -> 32 bytes.
On 64 bit platform, sizeof(match_context): 72 bytes -> 56 bytes.

It brings:
- deeper recursive call
- less memory consume
- less memory realloc

If limit the stack size to 1GB, the max value of n is:

re.match(r'(ab)*', n * 'ab')   # need to save MARKs
72 bytes: n = 11,184,809
64 bytes: n = 12,201,610
56 bytes: n = 13,421,771

re.match(r'(?:ab)*', n * 'ab') # no need to save MARKs
72 bytes: n = 13,421,770
64 bytes: n = 14,913,079
56 bytes: n = 16,777,214

🔶 Future optimizations
> If the penalty is significant, it will be a goal of future optimizations.

Maybe these two questions can be predicted when compiling the pattern:
- whether protect or not
- which MARKs should be protected
Then sre don't need to MARK_PUSH() all available MARKs to stack.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue35859>
_______________________________________