[New-bugs-announce] [issue32211] Document the bug in re.findall() and re.finditer() in 2.7 and 3.6

Serhiy Storchaka report at bugs.python.org
Mon Dec 4 05:08:06 EST 2017


New submission from Serhiy Storchaka <storchaka+cpython at gmail.com>:

>>> re.findall(r'^|\w+', 'two words')
['', 'wo', 'words']

Seems the current behavior was documented incorrectly in issue732120.

It will be fixed in 3.7 (see issue1647489, issue25054), but I hesitate to backport the fix to 3.6 and 2.7 because this can break the user code. For example:

In Python 3.6:

>>> list(re.finditer(r'(?m)^\s*?$', 'foo\n\n\nbar'))
[<_sre.SRE_Match object; span=(4, 4), match=''>, <_sre.SRE_Match object; span=(5, 5), match=''>]

In Python 3.7:

>>> list(re.finditer(r'(?m)^\s*?$', 'foo\n\n\nbar'))
[<re.Match object; span=(4, 4), match=''>, <re.Match object; span=(4, 5), match='\n'>, <re.Match object; span=(5, 5), match=''>]

(This is a real pattern used in the docstring module, but with re.sub()).

The proposed PR documents the current weird behavior in 2.7 and 3.6.

----------
assignee: docs at python
components: Documentation, Regular Expressions
messages: 307546
nosy: docs at python, ezio.melotti, mrabarnett, rhettinger, serhiy.storchaka
priority: normal
severity: normal
status: open
title: Document the bug in re.findall() and re.finditer() in 2.7 and 3.6
type: enhancement
versions: Python 2.7, Python 3.6

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue32211>
_______________________________________


More information about the New-bugs-announce mailing list