[issue24426] re.split performance degraded significantly by capturing group

Patrick Maupin report at bugs.python.org
Wed Jun 10 20:53:09 CEST 2015


New submission from Patrick Maupin:

The addition of a capturing group in a re.split() pattern, e.g. using '(\n)' instead of '\n', causes a factor of 10 performance degradation.

I use re.split a() lot, but never noticed the issue before.  It was extremely noticeable on 1000 patterns in a 5BG file, though, requiring 40 seconds instead of 4.

I have attached a script demonstrating the issue.  I have tested on 2.7 and 3.4, but have no reason to believe it doesn't exist on other vesions as well.

Thanks,
Pat

----------
components: Regular Expressions
files: splitter2.py
messages: 245137
nosy: Patrick Maupin, ezio.melotti, mrabarnett
priority: normal
severity: normal
status: open
title: re.split performance degraded significantly by capturing group
type: performance
versions: Python 2.7, Python 3.4
Added file: http://bugs.python.org/file39676/splitter2.py

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue24426>
_______________________________________


More information about the Python-bugs-list mailing list