Possible re bug when using ".*"

Roel Schroeven roel at roelschroeven.net
Wed Dec 28 13:59:09 EST 2022


Alexander Richert - NOAA Affiliate via Python-list schreef op 28/12/2022 
om 19:42:
>   In a couple recent versions of Python (including 3.8 and 3.10), the
> following code:
> import re
> print(re.sub(".*", "replacement", "pattern"))
> yields the output "replacementreplacement".
>
> This behavior does not occur in 3.6.
>
> Which behavior is the desired one? Perhaps relatedly, I noticed that even
> in 3.6, the code
> print(re.findall(".*","pattern"))
> yields ['pattern',''] which is not what I was expecting.
The documentation for re.sub() and re.findall() has these notes: 
"Changed in version 3.7: Empty matches for the pattern are replaced when 
adjacent to a previous non-empty match." and "Changed in version 3.7: 
Non-empty matches can now start just after a previous empty match."
That's probably describes the behavior you're seeing. ".*" first matches 
"pattern", which is a non-empty match; then it matches the empty string 
at the end, which is an empty match but is replaced because it is 
adjacent to a non-empty match.

Seems somewhat counter-intuitive to me, but AFAICS it's the intended 
behavior.

-- 
"Programming today is a race between software engineers striving to build bigger
and better idiot-proof programs, and the Universe trying to produce bigger and
better idiots. So far, the Universe is winning."
         -- Douglas Adams


More information about the Python-list mailing list