match point

Thomas 'PointedEars' Lahn PointedEars at web.de
Tue Dec 22 07:01:41 EST 2015


Thierry wrote:

> Reading the docs about regular expressions, I am under the impression
> that calling
> re.match(pattern, string)
> is exactly the same as
> re.search(r'\A'+pattern, string)

Correct.
 
> Same for fullmatch, that amounts to
> re.search(r'\A'+pattern+r'\Z', string)

Correct.
 
> The docs devote a chapter to "6.2.5.3. search() vs. match()", but they
> only discuss how match() is different from search() with '^', completely
> eluding the case of search() with r'\A'.
> 
> At first I thought those functions could have been introduced at a time
> when r'\A' and r'\Z' did not exist, but then I noticed that re.fullmatch
> is a recent addition (python 3.4)
> 
> Maybe re.match has an implementation that makes it more efficient? But
> then why would I ever use r'\A', since that anchor makes a pattern match
> in only a single position, and is therefore useless in functions like
> re.findall, re.finditer or re.split?

(Thank you for pointing out “\A” and “\Z”; this strongly suggests that even 
in raw mode you should always match literal “\” with the regular expression 
“\\”, or IOW that you should always use re.escape() when constructing 
regular expressions from arbitrary strings for matching WinDOS/UNC paths, 
for example.)

If you would use

  re.search(r'\Afoo.*^bar$.*baz\Z', string, flags=re.DOTALL | re.MULTILINE)

you could match only strings that start with “foo”, have a line following 
that which contains only “bar”, and end with “baz”.  (In multi-line mode, 
the meaning of “^” and “$” change to start-of-line and end-of-line, 
respectively.)

Presumably, re.fullmatch() was introduced in Python 3.4 so that you can 
write

  re.fullmatch(r'foo.*^bar$.*baz', string, flags=re.DOTALL | re.MULTILINE)

instead, since you are not actually searching, and would make sure that you 
*always* want to match against the whole string, regardless of the 
expression.

| Note that even in MULTILINE mode, re.match() will only match at the 
| beginning of the string and not at the beginning of each line.

and that

| re.search(pattern, string, flags=0)
|   Scan through string looking for the first location where the regular 
|   expression pattern produces a match […]

So with both re.search() and re.fullmatch(), you are more flexible should 
the expression be dynamically constructed: you can always use re.search().

<https://docs.python.org/3/library/re.html#re.search>

Please add your last name, Thierry #1701.

-- 
PointedEars

Twitter: @PointedEars2
Please do not cc me. / Bitte keine Kopien per E-Mail.



More information about the Python-list mailing list