4.2.2 Module Contents

 

The module defines the following functions and constants, and an exception:

compile (pattern[, flags])
Compile a regular expression pattern into a regular expression object, which can be used for matching using its match and search methods, described below.

The expression's behaviour can be modified by specifying a flags value. Values can be any of the following variables, combined using bitwise OR (the | operator).

I or IGNORECASE or (?i),

Perform case-insensitive matching; expressions like [A-Z] will match lowercase letters, too. This is not affected by the current locale.

L or LOCALE or (?L),

Make \w, \W, \b, \B, dependent on the current locale.

M or MULTILINE or (?m),

When specified, the pattern character ^ matches at the beginning of the string and at the beginning of each line (immediately following each newline); and the pattern character $ matches at the end of the string and at the end of each line (immediately preceding each newline). By default, ^ matches only at the beginning of the string, and $ only at the end of the string and immediately before the newline (if any) at the end of the string.

S or DOTALL or (?s),

Make the . special character any character at all, including a newline; without this flag, . will match anything except a newline.

X or VERBOSE or (?x),

Ignore whitespace within the pattern except when in a character class or preceded by an unescaped backslash, and, when a line contains a # neither in a character class or preceded by an unescaped backslash, all characters from the leftmost such # through the end of the line are ignored.

The sequence

prog = re.compile(pat)
result = prog.match(str)
is equivalent to
result = re.match(pat, str)
but the version using compile() is more efficient when the expression will be used several times in a single program.

escape (string)
Return string with all non-alphanumerics backslashed; this is useful if you want to match an arbitrary literal string that may have regular expression metacharacters in it.

match (pattern, string[, flags])
If zero or more characters at the beginning of string match the regular expression pattern, return a corresponding MatchObject instance. Return None if the string does not match the pattern; note that this is different from a zero-length match.

search (pattern, string[, flags])
Scan through string looking for a location where the regular expression pattern produces a match, and return a corresponding MatchObject instance. Return None if no position in the string matches the pattern; note that this is different from finding a zero-length match at some point in the string.

split (pattern, string, [, maxsplit=0])
Split string by the occurrences of pattern. If capturing parentheses are used in pattern, then occurrences of patterns or subpatterns are also returned.
>>> re.split('[\W]+', 'Words, words, words.')
['Words', 'words', 'words', '']
>>> re.split('([\W]+)', 'Words, words, words.')
['Words', ', ', 'words', ', ', 'words', '.', '']
This function combines and extends the functionality of the old regex.split() and regex.splitx().

sub (pattern, repl, string[, count=0])
Return the string obtained by replacing the leftmost non-overlapping occurrences of pattern in string by the replacement repl. If the pattern isn't found, string is returned unchanged. repl can be a string or a function; if a function, it is called for every non-overlapping occurance of pattern. The function takes a single match object argument, and returns the replacement string. For example:
>>> def dashrepl(matchobj):
...    if matchobj.group(0) == '-': return ' '
...    else: return '-'
>>> re.sub('-{1,2}', dashrepl, 'pro----gram-files')
'pro--gram files'
The pattern may be a string or a regex object; if you need to specify regular expression flags, you must use a regex object, or use embedded modifiers in a pattern; e.g.
sub("(?i)b+", "x", "bbbb BBBB") returns 'x x'.
The optional argument count is the maximum number of pattern occurrences to be replaced; count must be a non-negative integer, and the default value of 0 means to replace all occurrences.

Empty matches for the pattern are replaced only when not adjacent to a previous match, so sub('x*', '-', 'abc') returns '-a-b-c-'.

subn (pattern, repl, string[, count=0])
Perform the same operation as sub(), but return a tuple (new_string, number_of_subs_made).

error
Exception raised when a string passed to one of the functions here is not a valid regular expression (e.g., unmatched parentheses) or when some other error occurs during compilation or matching. (It is never an error if a string contains no match for a pattern.)

guido@CNRI.Reston.Va.US