[New-bugs-announce] [issue36397] re.split() incorrectly splitting on zero-width pattern
Elias Tarhini
report at bugs.python.org
Thu Mar 21 22:48:42 EDT 2019
New submission from Elias Tarhini <eltrhn at gmail.com>:
I believe I've found a bug in the `re` module -- specifically, in the 3.7+ support for splitting on zero-width patterns. Compare Java's behavior...
jshell> "1211".split("(?<=(\\d))(?!\\1)(?=\\d)");
$1 ==> String[3] { "1", "2", "11" }
...with Python's:
>>> re.split(r'(?<=(\d))(?!\1)(?=\d)', '1211')
['1', '1', '2', '2', '11']
(The pattern itself is pretty straightforward in design, but regex syntax can cloud things, so to be totally clear: it finds any point that follows a digit and precedes a *different* digit.)
* Tested on 3.7.1 win10 and 3.7.0 linux.
----------
components: Regular Expressions
messages: 338581
nosy: Elias Tarhini, ezio.melotti, mrabarnett
priority: normal
severity: normal
status: open
title: re.split() incorrectly splitting on zero-width pattern
type: behavior
versions: Python 3.7
_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue36397>
_______________________________________
More information about the New-bugs-announce
mailing list