The split() function of Python's built-in module has changed in a puzzling way - is this a bug?

Thomas Jollans tjol at tjol.eu
Fri Apr 23 03:52:41 EDT 2021


On 23/04/2021 01:53, Andy AO wrote:
> Upgrading from Python 3.6.8 to Python 3.9.0 and executing unit tests
> revealed a significant change in the behavior of re.split().
>
> but looking at the relevant documentation — Changelog <https://docs.
> python.org/3/whatsnew/changelog.html> and re - Regular expression
> operations - Python 3.9.4 documentation
> <https://docs.python.org/3/library/re.html?highlight=re%20search#re.split>
> yet no change is found.
>
> number = '123'def test_Asterisk_quantifier_with_capture_group(self):
>      resultList = re.split(r'(\d*)', self.number)
>      if platform.python_version() == '3.6.8':
>          self.assertEqual(resultList,['', '123', ''])
>
>      else:
>          self.assertEqual(resultList,['', '123', '', '', ''])


Hi Andy,

That's interesting. The old result is less surprising, but of course 
both are technically correct as the 4th element in the result matches 
your regexp.

The oldest version of Python I had lying around to test is 3.7; that has 
the same behaviour as 3.9.

I suspect that this behaviour is related to the following note in the 
docs for re.split:


Changed in version 3.7: Added support of splitting on a pattern that 
could match an empty string.


(your pattern can match an empty string, so I suppose it wasn't 
technically supported in 3.6?)


-- Thomas



>
> I feel that this is clearly not in line with the description of the
> function in the split documentation, and it is also strange that after
> replacing * with +, the behavior is still the same as in 3.6.8.
>
>     1. why is this change not in the documentation? Is it because I didn’t
>     find it?
>     2. Why did the behavior change this way? Was a bug introduced, or was it
>     a bug fix?



More information about the Python-list mailing list