[pypy-dev] Program slower on Pypy 7.3.3 (3.7.9) than CPython 3.9.
Carl Friedrich Bolz-Tereick
cfbolz at gmx.de
Tue Mar 16 05:27:20 EDT 2021
On 3/15/21 11:16 PM, Dan Stromberg wrote:
>
> And it's opensource, though many of the inputs are licensed.
>
> The code is at https://stromberg.dnsalias.org/~strombrg/music-pipeline/
> <https://stromberg.dnsalias.org/~strombrg/music-pipeline/>
> (https://stromberg.dnsalias.org/svn/music-pipeline/trunk/
> <https://stromberg.dnsalias.org/svn/music-pipeline/trunk/>)
>
> It appears to be more than 10x slower.
>
> I haven't profiled it yet. I believe it's probably the "Blocklisting
> files..." part that's slow. That part is O(n*m), and seems to take
> forever. It's heavy on regular expressions.
>
> Are regular expressions expected to be slow on Pypy3?
Hi Dan,
Interesting problem! single regular expressions are reasonably fast on
PyPy, being jitted. But I don't think we looked into the problem of
"what if you have thousands of them" before. Your reproducer is hitting
a kind of known, hard to fix corner case of the JIT, it's actually
producing a linear search over the existing regular expressions for
every match call in this case, with catastrophic consequences. It's on
my mid-term plans to work on this problem, but not next week.
Here's a fun workaround, that improves the performance of both CPython
(by about 2x for me) and pypy (by 10x or so): turn the many regular
expressions into a single one:
regex_strings = [f"(?:{one_regex()})" for repno in range(2_046)]
regex_compiled = re.compile("|".join(regex_strings))
then you replace the match calls with a single one:
for filename in filenames:
if regex_compiled.match(filename):
matches += 1
I believe you can try the same approach for your full program?
Cheers,
Carl Friedrich
More information about the pypy-dev
mailing list