[issue35859] Capture behavior depends on the order of an alternation
Ma Lin
report at bugs.python.org
Tue Mar 12 09:52:50 EDT 2019
Ma Lin <malincns at 163.com> added the comment:
> Could you please create and run some microbenchmarks to measure
> possible performance penalty of additional MARH_PUSHes? I am
> especially interesting in worst cases.
Besides the worst case, I prepared two solutions.
Solution_A (PR12288):
Fix the bugs, I can find test-case for every changes.
I feel JUMP_MIN_UNTIL_3 should MARK_PUSH() as well, but I really can't find a test-case to prove this feel.
Commit_12 is a safety-check, if JUMP_MIN_UNTIL_3 or other JUMPs should be protected, maybe there will be a bug report come from millions of users.
Solution_B (PR12289):
Based on Solution_A, protects JUMP_MIN_UNTIL_3.
Worst (PR12290):
Based on Solution_B, protects in everywhere, this should the worst performance.
Legend of this table:
* No: no protection
* L : LASTMARK_SAVE()
* Mu: MARK_PUSH() unconditionally
* Mr: MARK_PUSH() if in a repeat
unpatched Solution_A Solution_B Worst
JUMP_MAX_UNTIL_1 No No No -> L/Mu
JUMP_MAX_UNTIL_2 L/Mu L/Mu L/Mu L/Mu
JUMP_MAX_UNTIL_3 No No No -> L/Mu
JUMP_MIN_UNTIL_1 No No No -> L/Mu
JUMP_MIN_UNTIL_2 L -> L/Mr L/Mr -> L/Mu
JUMP_MIN_UNTIL_3 No No -> L/Mu -> L/Mu
JUMP_REPEAT_ONE_1 L -> L/Mr L/Mr -> L/Mu
JUMP_REPEAT_ONE_2 L -> L/Mr L/Mr -> L/Mu
JUMP_MIN_REPEAT_ONE L -> L/Mr L/Mr -> L/Mu
JUMP_BRANCH L/Mr L/Mr L/Mr -> L/Mu
JUMP_ASSERT No No No -> L/Mu
JUMP_ASSERT_NOT No -> L/Mr L/Mr -> L/Mu
🔶 I made a benchmark tool for Python RE engines.
https://github.com/animalize/re_benchmarks
re100mb.py was extracted from a real case, process 100MB real data in 16 rounds (with 16 patterns).
This table picks best of 4 results from each round:
unpatched A B Worst regex 2019.3.9
test 01 best: 0.647 0.629 0.625 0.635 0.102
test 02 best: 3.980 4.046 4.052 4.005 4.530
test 03 best: 0.730 0.708 0.709 0.706 0.433
test 04 best: 0.763 0.743 0.740 0.736 0.799
test 05 best: 2.566 2.693 2.692 2.654 0.981
test 06 best: 0.883 0.865 0.859 0.874 0.872
test 07 best: 2.847 2.773 2.797 2.890 1.202
test 08 best: 3.644 3.664 3.740 3.699 1.139
test 09 best: 0.344 0.348 0.343 0.345 0.378
test 10 best: 0.341 0.347 0.343 0.344 0.407
test 11 best: 4.490 4.655 4.520 4.440 1.340
test 12 best: 0.264 0.263 0.262 0.264 0.271
test 13 best: 0.230 0.230 0.231 0.233 0.281
test 14 best: 0.932 0.925 0.919 0.943 0.334
test 15 best: 1.837 1.815 1.804 1.866 0.683
test 16 best: 1.691 1.708 1.676 2.042 3.805
--------------------------------------------------------
sum of above: 26.189 26.412 26.312 26.676 17.557
There seems no significant changes.
🔶 Below are some micro benchmarks, run t.py with the benchmark tool testit.py
SRE_OP_MAX_UNTIL a few repeats
unpatched: 744.85 nsec
SolutionA: 755.86 nsec
SolutionB: 745.14 nsec
Worst: 843.56 nsec
regex: 2.45 usec
SRE_OP_MAX_UNTIL many repeats
unpatched: 393.24 msec
SolutionA: 321.16 msec
SolutionB: 323.24 msec
Worst: 498.48 msec
regex: 240.73 msec
------------------
SRE_OP_MIN_UNTIL a few repeats
unpatched: 702.75 nsec
SolutionA: 701.90 nsec
SolutionB: 730.81 nsec
Worst: 873.67 nsec
regex: 1.84 usec
SRE_OP_MIN_UNTIL many repeats
unpatched: 210.60 msec
SolutionA: 174.20 msec
SolutionB: 323.93 msec
Worst: 491.73 msec
regex: 195.11 msec
------------------
SRE_OP_REPEAT_ONE short string
unpatched: 466.56 nsec
SolutionA: 468.85 nsec
SolutionB: 466.04 nsec
Worst: 463.86 nsec
regex: 1.13 usec
SRE_OP_REPEAT_ONE long string
unpatched: 2.19 msec
SolutionA: 2.19 msec
SolutionB: 2.19 msec
Worst: 2.19 msec
regex: 3.23 msec
------------------
SRE_OP_MIN_REPEAT_ONE short string
unpatched: 563.76 nsec
SolutionA: 566.68 nsec
SolutionB: 561.92 nsec
Worst: 601.69 nsec
regex: 1.12 usec
SRE_OP_MIN_REPEAT_ONE long string
unpatched: 11.16 msec
SolutionA: 11.27 msec
SolutionB: 10.55 msec
Worst: 15.97 msec
regex: 7.24 msec
------------------
SRE_OP_BRANCH
unpatched: 419.34 nsec
SolutionA: 422.07 nsec
SolutionB: 418.25 nsec
Worst: 423.56 nsec
regex: 1.34 usec
------------------
SRE_OP_ASSERT
unpatched: 320.58 nsec
SolutionA: 326.29 nsec
SolutionB: 320.81 nsec
Worst: 333.22 nsec
regex: 1.14 usec
SRE_OP_ASSERT_NOT
unpatched: 440.18 nsec
SolutionA: 440.93 nsec
SolutionB: 434.44 nsec
Worst: 446.33 nsec
regex: 1.00 usec
🔶 reduce the size of match_context struct
In stack-consuming scenes, Solution_A and Solution_B are better than unpatched.
This is because commit 10 and commit 11, they reduced the size of match_context struct, the stack uses this struct to simulate recursive call.
On 32 bit platform, sizeof(match_context): 36 bytes -> 32 bytes.
On 64 bit platform, sizeof(match_context): 72 bytes -> 56 bytes.
It brings:
- deeper recursive call
- less memory consume
- less memory realloc
If limit the stack size to 1GB, the max value of n is:
re.match(r'(ab)*', n * 'ab') # need to save MARKs
72 bytes: n = 11,184,809
64 bytes: n = 12,201,610
56 bytes: n = 13,421,771
re.match(r'(?:ab)*', n * 'ab') # no need to save MARKs
72 bytes: n = 13,421,770
64 bytes: n = 14,913,079
56 bytes: n = 16,777,214
🔶 Future optimizations
> If the penalty is significant, it will be a goal of future optimizations.
Maybe these two questions can be predicted when compiling the pattern:
- whether protect or not
- which MARKs should be protected
Then sre don't need to MARK_PUSH() all available MARKs to stack.
----------
_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue35859>
_______________________________________
More information about the Python-bugs-list
mailing list