[issue37723] important performance regression on regular expression parsing

yannvgn report at bugs.python.org
Wed Jul 31 12:28:12 EDT 2019


yannvgn <hi at yannvgn.io> added the comment:

> Indeed, it was not expected that the character set contains hundreds of thousands items. What is its size in your real code?

> Could you please show benchmarking results for different implementations and different sizes?

I can't precisely answer that, but sacremoses (a tokenization package) for example is strongly impacted. See https://github.com/alvations/sacremoses/issues/61#issuecomment-516401853

----------

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue37723>
_______________________________________


More information about the Python-bugs-list mailing list