Raw string substitution problem

Thu Dec 17 11:51:26 EST 2009

On 12/17/2009 11:24 AM, Richard Brodie wrote:
> A raw string is not a distinct type from an ordinary string
> in the same way byte strings and Unicode strings are. It
> is a merely a notation for constants, like writing integers
> in hexadecimal.
>
>>>> (r'\n', u'a', 0x16)
> ('\\n', u'a', 22)

Yes, that was a mistake.  But the problem remains::

         >>> re.sub('abc', r'a\nb\n.c\a','123abcdefg') == re.sub('abc', 'a\\nb\\n.c\\a',' 123abcdefg') == re.sub('abc', 'a\nb\n.c\a','123abcdefg')
         True
         >>> r'a\nb\n.c\a' == 'a\\nb\\n.c\\a' == 'a\nb\n.c\a'
         False

Why are the first two strings being treated as if they are the last one?
That is, why isn't '\\' being processed in the obvious way?
This still seems wrong.  Why isn't it?

More simply, consider::

         >>> re.sub('abc', '\\', '123abcdefg')
         Traceback (most recent call last):
           File "<stdin>", line 1, in <module>
           File "C:\Python26\lib\re.py", line 151, in sub
             return _compile(pattern, 0).sub(repl, string, count)
           File "C:\Python26\lib\re.py", line 273, in _subx
             template = _compile_repl(template, pattern)
           File "C:\Python26\lib\re.py", line 260, in _compile_repl
             raise error, v # invalid expression
         sre_constants.error: bogus escape (end of line)

Why is this the proper handling of what one might think would be an
obvious substitution?

Thanks,
Alan Isaac