[Speed] Performance comparison of regular expression engines

Serhiy Storchaka storchaka at gmail.com
Sun Mar 6 04:21:56 EST 2016


On 06.03.16 09:14, Maciej Fijalkowski wrote:
> Any chance you can rerun this on pypy?

Results on PyPy 2.2.1 (I'm not sure I could build the last PyPy on my computer):

                                               re str.find

Twain                                   5   5.469  3.852
(?i)Twain                              10   8.646
[a-z]shing                            165   17.24
Huck[a-zA-Z]+|Saw[a-zA-Z]+             52   7.763
\b\w+nn\b                              32     101
[a-q][^u-z]{13}x                      445   167.6
Tom|Sawyer|Huckleberry|Finn           314   8.583
(?i)Tom|Sawyer|Huckleberry|Finn       477    16.3
.{0,2}(Tom|Sawyer|Huckleberry|Finn)   314   270.9
.{2,4}(Tom|Sawyer|Huckleberry|Finn)   237     262
Tom.{10,25}river|river.{10,25}Tom       1   8.461
[a-zA-Z]+ing                        10079     348
\s[a-zA-Z]{0,12}ing\s                7160   115.8
([A-Za-z]awyer|[A-Za-z]inn)\s          50   16.62
["'][^"']{0,30}[?!\.]["']            1618   14.45

Alternative regular expression engines need extension modules and don't work on PyPy for me.

For comparison results on CPython 2.7.11+:

                                               re  regex    re2   pcre str.find

Twain                                   5   4.423  2.699  8.045   93.4  4.181
(?i)Twain                              10   50.07  3.563  20.35  185.6
[a-z]shing                            165   98.68  6.365  23.71   2886
Huck[a-zA-Z]+|Saw[a-zA-Z]+             52   58.97  50.26  19.52   1016
\b\w+nn\b                              32   130.1  416.5  18.38  740.7
[a-q][^u-z]{13}x                      445   406.6  7.935   5886   7137
Tom|Sawyer|Huckleberry|Finn           314   53.09   59.1  20.33   5377
(?i)Tom|Sawyer|Huckleberry|Finn       477   281.2  338.5  23.77   7895
.{0,2}(Tom|Sawyer|Huckleberry|Finn)   314   419.5   1142  20.69   6423
.{2,4}(Tom|Sawyer|Huckleberry|Finn)   237   410.9   1013  18.99   5224
Tom.{10,25}river|river.{10,25}Tom       1   63.17  58.31  18.94  260.2
[a-zA-Z]+ing                        10079   203.8  363.8  43.78 1.583e+05
\s[a-zA-Z]{0,12}ing\s                7160   127.1  26.65  34.23 1.114e+05
([A-Za-z]awyer|[A-Za-z]inn)\s          50   147.6  412.4  21.57   1172
["'][^"']{0,30}[?!\.]["']            1618   85.88  86.55  22.22 2.576e+04

And on Jython 2.5.3 with JRE 7:

                                               re str.find

Twain                                   5      34      3
(?i)Twain                              10     251
[a-z]shing                            165     564
Huck[a-zA-Z]+|Saw[a-zA-Z]+             52     281
\b\w+nn\b                              32     510
[a-q][^u-z]{13}x                      445    1786
Tom|Sawyer|Huckleberry|Finn           314     102
(?i)Tom|Sawyer|Huckleberry|Finn       477    1232
.{0,2}(Tom|Sawyer|Huckleberry|Finn)   314    1345
.{2,4}(Tom|Sawyer|Huckleberry|Finn)   237    1353
Tom.{10,25}river|river.{10,25}Tom       1     305
[a-zA-Z]+ing                        10079    1211
\s[a-zA-Z]{0,12}ing\s                7160     571
([A-Za-z]awyer|[A-Za-z]inn)\s          50     676
["'][^"']{0,30}[?!\.]["']            1618     431




More information about the Speed mailing list