[New-bugs-announce] [issue10328] re.sub[n] doesn't seem to handle /Z replacements correctly in all cases

Alexander Schmolck report at bugs.python.org
Fri Nov 5 15:33:48 CET 2010


New submission from Alexander Schmolck <a.schmolck+bug at gmail.com>:

In certain cases a zero-width /Z match that should be replaced isn't.

An example might help:

 re.compile('(?m)(?P<trailing_ws>[ \t]+\r*$)|(?P<no_final_newline>(?<=[^\n])\Z)').subn(lambda m:next('<'+k+'>' for k,v in m.groupdict().items() if v is not None), 'foobar ')

this gives

 ('foobar<trailing_ws>', 1)

I would have expected

('foobar<trailing_ws><no_final_newline>', 2)

Contrast this with the following behavior:

 [m.span() for m in re.compile('(?P<trailing_ws>[ \t]+\r*$)|(?P<no_final_newline>(?<=[^\n])\Z)', re.M).finditer('foobar ')]

gives
 
 [(6, 7), (7, 7)]

The matches are clearly not overlapping and the re module docs for sub say "Return the string obtained by replacing the leftmost non-overlapping occurrences of pattern in string by the replacement repl.", so I would have expected two replacements.


This seems to be what perl is doing:

 echo -n 'foobar ' | perl -pe 's/(?m)(?P<trailing_ws>[ \t]+\r*$)|(?P<no_final_newline>(?<=[^\n])\Z)/<$&>/g'                    

gives
 foobar< ><>%

----------
components: Regular Expressions
messages: 120499
nosy: Alexander.Schmolck
priority: normal
severity: normal
status: open
title: re.sub[n] doesn't seem to handle /Z replacements correctly in all cases
type: behavior
versions: Python 2.6, Python 3.1

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue10328>
_______________________________________


More information about the New-bugs-announce mailing list