[Tutor] use of raw strings with regular expression patterns

Manprit Singh manpritsinghece at gmail.com
Sun Nov 8 08:28:54 EST 2020


Dear Sir,

I have one more very basic question .
Suppose I have to remove all "a" inside the string s1.

s1 = "saaaaregaaaaamaaaa"

>>> re.sub(r"a+", "", s1)
'sregm'
>>> re.sub(r"a", "", s1)
'sregm'

I have  solved this with two  patterns , one includes a "+"  that means one
or more repetition of the previous re . I am confused what pattern must be
chosen for this  particular case?

Regards
Manprit Singh

On Sun, Nov 8, 2020 at 3:12 AM Cameron Simpson <cs at cskk.id.au> wrote:

> On 06Nov2020 22:33, Manprit Singh <manpritsinghece at gmail.com> wrote:
> >As you know there are some special characters in regular expressions ,
> >like
> >:
> >\A, \B, \b, \d, \D, \s, \S, \w, \W, \Z
> >
> >is it necessary to use raw string notation like r'\A' while using  re
> >patterns made up of these characters ?
>
> Another thing not mentioned in the replies is the backslash itself.
>
> The advantage of a raw string is that when you write a backslash, it is
> part of the string as-is.
>
> So to put a backslash in a regular string, so that it is part of the
> result, you would need to write:
>
>     \\
>
> In a raw string, you just write:
>
>     \
>
> exactly as you want things.
>
> Now, it happens that in a regular string a backslash _not_ followed by a
> special character (eg "n" for "\n", a newline) is preserved. So they get
> through to the final string anyway. But the moment you _do_ follow the
> backslash with such a character, it is consumed and the character
> translated.
>
> Example:
>
>     \h
>
> Ordinary string '\h' -> \h
> Raw string: r'\h' -> \h
> A backslash and an "h" in the result.
>
> But:
>
>     \n
>
> Ordinary string: '\n' -> newline
> Raw string: r'\n' -> \n
> A newline in the result for the former, a backslash and an "n" for the
> latter.
>
> So the advantage of the raw string is _reliably preserving the
> backslash_.
>
> For any situation where backslashes are intended in the resulting string
> it is recommended to use a "raw" string in Python, for this reliability.
>
> The two common situations are regexps where backslash introduces special
> character classes and Windows file paths, where backslash is the file
> separator.
>
> Cheers,
> Cameron Simpson <cs at cskk.id.au>
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> To unsubscribe or change subscription options:
> https://mail.python.org/mailman/listinfo/tutor
>


More information about the Tutor mailing list