Regular expressions

Wed Nov 4 22:03:45 EST 2015

Steven D'Aprano <steve at pearwood.info> writes:

> On Wed, 4 Nov 2015 07:57 pm, Peter Otten wrote:
>
> > I tried Tim's example
> > 
> > $ seq 5 | grep '1*'
> > 1
> > 2
> > 3
> > 4
> > 5
> > $
>
> I don't understand this. What on earth is grep matching? How does "4"
> match "1*"?

You can experiment with regular expressions to find out. Here's a link
to the RegExr tool for the above pattern <URL:http://regexr.com/3c4ot>.

Matching patterns can include specifications meaning “match some number
of the preceding segment”, with the ‘{n,m}’ notation. That means “match
at least n, and at most m, occurrences of the preceding segment”. Either
‘n’ or ‘m’ can be omitted, meaning “at least 0” and “no maximum”
respectively.

Those are quite useful, so there are shortcuts for the most common
cases: ‘?’ is a short cut for ‘{0,1}’, ‘*’ is a short cut for ‘{0,}’,
and ‘+’ is a short cut for ‘{1,}’.

In this case, ‘*’ is a short cut for ‘{0,}’ meaning “match 0 or more
occurences of the preceding segment”. The segment here is the atom ‘1’.
Since ‘1*’ is the entirety of the pattern, the pattern can match zero
characters, anywhere within any string. So, it matches every possible
string.

To match (some atom) 1 or more times, ‘+’ is a short cut for ‘(1,}’
meaning “match 1 or more occurrences of the preceding segment”.

-- 
 \    學而不思則罔，思而不學則殆。 (To study and not think is a waste. |
  `\                             To think and not study is dangerous.) |
_o__)                            —孔夫子 Confucius (551 BCE – 479 BCE) |
Ben Finney