[Tutor] use of raw strings with regular expression patterns

Cameron Simpson cs at cskk.id.au
Sat Nov 7 16:42:02 EST 2020


On 06Nov2020 22:33, Manprit Singh <manpritsinghece at gmail.com> wrote:
>As you know there are some special characters in regular expressions , 
>like
>:
>\A, \B, \b, \d, \D, \s, \S, \w, \W, \Z
>
>is it necessary to use raw string notation like r'\A' while using  re
>patterns made up of these characters ?

Another thing not mentioned in the replies is the backslash itself.

The advantage of a raw string is that when you write a backslash, it is 
part of the string as-is.

So to put a backslash in a regular string, so that it is part of the 
result, you would need to write:

    \\

In a raw string, you just write:

    \

exactly as you want things.

Now, it happens that in a regular string a backslash _not_ followed by a 
special character (eg "n" for "\n", a newline) is preserved. So they get 
through to the final string anyway. But the moment you _do_ follow the 
backslash with such a character, it is consumed and the character 
translated.

Example:

    \h

Ordinary string '\h' -> \h
Raw string: r'\h' -> \h
A backslash and an "h" in the result.

But:

    \n

Ordinary string: '\n' -> newline
Raw string: r'\n' -> \n
A newline in the result for the former, a backslash and an "n" for the 
latter.

So the advantage of the raw string is _reliably preserving the 
backslash_.

For any situation where backslashes are intended in the resulting string 
it is recommended to use a "raw" string in Python, for this reliability.

The two common situations are regexps where backslash introduces special 
character classes and Windows file paths, where backslash is the file 
separator.

Cheers,
Cameron Simpson <cs at cskk.id.au>


More information about the Tutor mailing list