[issue24426] re.split performance degraded significantly by capturing group
Patrick Maupin
report at bugs.python.org
Wed Jun 10 20:53:09 CEST 2015
New submission from Patrick Maupin:
The addition of a capturing group in a re.split() pattern, e.g. using '(\n)' instead of '\n', causes a factor of 10 performance degradation.
I use re.split a() lot, but never noticed the issue before. It was extremely noticeable on 1000 patterns in a 5BG file, though, requiring 40 seconds instead of 4.
I have attached a script demonstrating the issue. I have tested on 2.7 and 3.4, but have no reason to believe it doesn't exist on other vesions as well.
Thanks,
Pat
----------
components: Regular Expressions
files: splitter2.py
messages: 245137
nosy: Patrick Maupin, ezio.melotti, mrabarnett
priority: normal
severity: normal
status: open
title: re.split performance degraded significantly by capturing group
type: performance
versions: Python 2.7, Python 3.4
Added file: http://bugs.python.org/file39676/splitter2.py
_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue24426>
_______________________________________
More information about the Python-bugs-list
mailing list