[Python-checkins] r59896 - in python/trunk: Doc/library/re.rst Lib/test/test_re.py

Thu Jan 10 22:59:42 CET 2008

Author: amaury.forgeotdarc
Date: Thu Jan 10 22:59:42 2008
New Revision: 59896

Modified:
   python/trunk/Doc/library/re.rst
   python/trunk/Lib/test/test_re.py
Log:
Closing issue1761.
Surprising behaviour of the "$" regexp: it matches the
end of the string, AND just before the newline at the end 
of the string::

    re.sub('$', '#', 'foo\n') == 'foo#\n#'

Python is consistent with Perl and the pcre library, so
we just document it.
Guido prefers "\Z" to match only the end of the string.



Modified: python/trunk/Doc/library/re.rst
==============================================================================

--- python/trunk/Doc/library/re.rst	(original)
+++ python/trunk/Doc/library/re.rst	Thu Jan 10 22:59:42 2008
@@ -98,7 +98,9 @@
    string, and in :const:`MULTILINE` mode also matches before a newline.  ``foo``
    matches both 'foo' and 'foobar', while the regular expression ``foo$`` matches
    only 'foo'.  More interestingly, searching for ``foo.$`` in ``'foo1\nfoo2\n'``
-   matches 'foo2' normally, but 'foo1' in :const:`MULTILINE` mode.
+   matches 'foo2' normally, but 'foo1' in :const:`MULTILINE` mode; searching for
+   a single ``$`` in ``'foo\n'`` will find two (empty) matches: one just before
+   the newline, and one at the end of the string.
 
 ``'*'``
    Causes the resulting RE to match 0 or more repetitions of the preceding RE, as

Modified: python/trunk/Lib/test/test_re.py
==============================================================================
--- python/trunk/Lib/test/test_re.py	(original)
+++ python/trunk/Lib/test/test_re.py	Thu Jan 10 22:59:42 2008
@@ -671,6 +671,18 @@
         q = p.match(upper_char)
         self.assertNotEqual(q, None)
 
+    def test_dollar_matches_twice(self):
+        "$ matches the end of string, and just before the terminating \n"
+        pattern = re.compile('$')
+        self.assertEqual(pattern.sub('#', 'a\nb\n'), 'a\nb#\n#')
+        self.assertEqual(pattern.sub('#', 'a\nb\nc'), 'a\nb\nc#')
+        self.assertEqual(pattern.sub('#', '\n'), '#\n#')
+
+        pattern = re.compile('$', re.MULTILINE)
+        self.assertEqual(pattern.sub('#', 'a\nb\n' ), 'a#\nb#\n#' )
+        self.assertEqual(pattern.sub('#', 'a\nb\nc'), 'a#\nb#\nc#')
+        self.assertEqual(pattern.sub('#', '\n'), '#\n#')
+
 
 def run_re_tests():
     from test.re_tests import benchmarks, tests, SUCCEED, FAIL, SYNTAX_ERROR