?Module re documentation bug, error, or misunderstanding?

Fri Jul 26 17:48:24 EDT 2002

    Norman> This solution leads to another question, probably a
    Norman> documentation issue.  I do not need "r" or double backslashes
    Norman> when I use \d, \s, \S, ... all the special sequences that
    Norman> \number is included with in the documentation.  What indication
    Norman> is there in the documentation that \number must be handled
    Norman> differently than say, \d ?

You're fighting two battles at once when you don't use raw strings.
Python's lexical analyzer recognizes a wide array of escape sequences.  For
example, "\t" is a one-character string containing an ASCII TAB character,
not a two-character string containing a backslash followed by a "t".  Since
lexical analysis is complete before the regular expression compiler sees the
strings, many escape sequences will have disappeared by that stage of the
game.

Using raw strings ("r" prefix) tells the lexical analyzer to not interpret
escape sequences in that string.  Thus r"\t" is a two-character sequence
consisting of a backslash followed by a "t".  Raw strings were created
specifically to avoid this conflict between the lexical analyzer and the
regular expression compiler.  You can think of it as the usually boorish
lexical analyzer showing some manners by not eating all the shrimp at the
buffet. ;-)

For a discussion of string literals, check this section of the language
reference manual:

    http://www.python.org/doc/current/ref/strings.html

For a discussion of regular expression syntax, check this section of the
library manual:

    http://www.python.org/doc/current/lib/re-syntax.html

I'm sure there are other documentation bits that may help as well, but these
two pages list the escape sequences the two parts of the system understand.

-- 
Skip Montanaro
skip at pobox.com
consulting: http://manatee.mojam.com/~skip/resume.html