[issue25054] Capturing start of line '^'

Serhiy Storchaka report at bugs.python.org
Sun Nov 19 19:04:13 EST 2017


Serhiy Storchaka <storchaka+cpython at gmail.com> added the comment:

PR 4471 fixes this issue, issue1647489, and a couple of similar issues.

The most visible change is the change in re.split(). This is compatibility breaking change, and it affects third-party code. But ValueError or FutureWarning were raised for patterns that will change the behavior in this PR for two Python releases, since Python 3.5. Developers had enough time for fixing them. In most cases this is so trivial as changing `*` to `+` in `\s*`.

Changes in sub(), findall(), and finditer() are less visible. No one existing test needs modification for them. Was:

>>> re.split(r"\b|:+", "a::bc")
/usr/lib/python3.6/re.py:212: FutureWarning: split() requires a non-empty pattern match.
  return _compile(pattern, flags).split(string, maxsplit)
['a:', 'bc']
>>> re.sub(r"\b|:+", "-", "a::bc")
'-a-:-bc-'
>>> re.findall(r"\b|:+", "a::bc")
['', '', ':', '', '']
>>> list(re.finditer(r"\b|:+", "a::bc"))
[<_sre.SRE_Match object; span=(0, 0), match=''>, <_sre.SRE_Match object; span=(1, 1), match=''>, <_sre.SRE_Match object; span=(2, 3), match=':'>, <_sre.SRE_Match object; span=(3, 3), match=''>, <_sre.SRE_Match object; span=(5, 5), match=''>]

Fixed:

>>> re.split(r"\b|:+", "a::bc")
['', 'a', '', 'bc', '']
>>> re.sub(r"\b|:+", "-", "a::bc")
'-a--bc-'
>>> re.findall(r"\b|:+", "a::bc")
['', '', '::', '', '']
>>> list(re.finditer(r"\b|:+", "a::bc"))
[<re.Match object; span=(0, 0), match=''>, <re.Match object; span=(1, 1), match=''>, <re.Match object; span=(1, 3), match='::'>, <re.Match object; span=(3, 3), match=''>, <re.Match object; span=(5, 5), match=''>]

The behavior of re.split(), re.findall() and re.finditer() now is the same as in the regex module with the V1 flag. But the behavior of re.sub() left closer to the previous behavior, otherwise this would break existing tests. It is consistent with re.split() rather of re.findall() and re.finditer(). In regex with the V1 flag sub() is consistent with findall() and finditer(), but not with split().

----------

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue25054>
_______________________________________


More information about the Python-bugs-list mailing list