[Tutor] re.search() help

Mon Apr 16 02:12:26 CEST 2012

Hi Michael,

On 16 April 2012 01:20, Michael Lewis <mjolewis at gmail.com> wrote:

> So that you can find the section of a long string that
>> first matches your regex.
>>
>>
> Why not use re.findall() for that? It seems like re.findall() can fill
> this need and I wouldn't need to use re.search()? Can you compare an
> example circumstance where one would be better suited than the other?
>

To add to what Alan's said:  Regular expressions is a textual specification
language that is often used in the specification of tokens, e.g. when
designing (for example) some type of computer language, or perhaps a
preprocessor for a computer language, the tokens themselves will typically
be specified using regular expressions (or something similar), while the
grammar of the language will typically be expressed using something like
BNF or more typically EBNF (which stands for Extended Backus-Naur Form,
after the creators of the syntax.)

Anyway, so as an example then of where you would not use re.findall() (or
indeed re.search()), in terms of compilers  you typically have a scanner
and parser component where the scanner has the job of taking the input text
which can be seen as a sequence of text characters and and converting it
into a sequence of tokens, and this tokenization process may well involve
the use of regular expressions, where it only makes sense to match only at
the beginning of the text being tokenized/scanned.  So in such a context
you'd definitely not even want to use re.search() or re.findall() but
rather would probably use re.match() since you're expressly trying to
recognize the next "word" (token) from the text being scanned (and
ultimately, parsed.)

Another example: Suppose you're implementing a preprocessor for a
programming language, and you therefore want to ignore most of the actual
programming language text (since you're really only interested in the
pre-processor language that interspersed with the "normal" programming
language.)  In such a scenario the first (or next) pre-processor language
token will likely not be at the beginning of your current program text
block, so in such a case re.match() or re.findall() would not be helpful
since you want to then find the first token in the text, which may or may
not be at the beginning of the string. In such a case you'd therefore like
to use re.search().

HTH,

Walter
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20120416/20ba79b7/attachment.html>