+ in regular expression

Duncan Booth duncan.booth at invalid.invalid
Fri Oct 5 05:23:26 EDT 2012


Cameron Simpson <cs at zip.com.au> wrote:

> On 03Oct2012 21:17, Ian Kelly <ian.g.kelly at gmail.com> wrote:
>| On Wed, Oct 3, 2012 at 9:01 PM, contro opinion
>| <contropinion at gmail.com> wrote: 
>| > why the  "\s{6}+"  is not a regular pattern?
>| 
>| Use a group: "(?:\s{6})+"
> 
> Yeah, it is probably a precedence issue in the grammar.
> "(\s{6})+" is also accepted.

It's about syntax, not precedence, but the documentation doesn't really 
spell it out in full. Like most regex documentation it talks in woolly 
terms about special characters rather than giving a formal syntax.

A regular expression element may be followed by a quantifier. 
Quantifiers are '*', '+', '?', '{n}', '{n,m}' (and lazy quantifiers 
'*?', '+?', '{n,m}?'). There's nothing in the regex language which says 
you can follow an element with two quantifiers. Parentheses (grouping or 
non-grouping) around a regex turn that regex into a single element which 
is why you can then use another quantifier.

In bnf, I think Python's regexes would be somthing like:

re ::= union | simple-re
union ::= re | simple-re
simple-re ::= concatenation | basic-re
concatenation ::= simple-re basic-re
basic-re ::= element | element quantifier
element ::= group | nc-group | "." | "^" | "$" | char | charset
quantifier = "*" | "+" | "?" | "{" NUMBER "}" | "{" NUMBER "," NUMBER 
"}" |"*?" | "+?" | "{" NUMBER "," NUMBER "}?"
group ::= "(" re ")"
nc-group ::= "(?:" re ")"
char = <any non-special character> | "\" <any character>

... and so on. I didn't include charsets or all the (?...) extensions or 
special sequences.

-- 
Duncan Booth http://kupuguy.blogspot.com



More information about the Python-list mailing list