+ in regular expression
Duncan Booth
duncan.booth at invalid.invalid
Fri Oct 5 05:23:26 EDT 2012
Cameron Simpson <cs at zip.com.au> wrote:
> On 03Oct2012 21:17, Ian Kelly <ian.g.kelly at gmail.com> wrote:
>| On Wed, Oct 3, 2012 at 9:01 PM, contro opinion
>| <contropinion at gmail.com> wrote:
>| > why the "\s{6}+" is not a regular pattern?
>|
>| Use a group: "(?:\s{6})+"
>
> Yeah, it is probably a precedence issue in the grammar.
> "(\s{6})+" is also accepted.
It's about syntax, not precedence, but the documentation doesn't really
spell it out in full. Like most regex documentation it talks in woolly
terms about special characters rather than giving a formal syntax.
A regular expression element may be followed by a quantifier.
Quantifiers are '*', '+', '?', '{n}', '{n,m}' (and lazy quantifiers
'*?', '+?', '{n,m}?'). There's nothing in the regex language which says
you can follow an element with two quantifiers. Parentheses (grouping or
non-grouping) around a regex turn that regex into a single element which
is why you can then use another quantifier.
In bnf, I think Python's regexes would be somthing like:
re ::= union | simple-re
union ::= re | simple-re
simple-re ::= concatenation | basic-re
concatenation ::= simple-re basic-re
basic-re ::= element | element quantifier
element ::= group | nc-group | "." | "^" | "$" | char | charset
quantifier = "*" | "+" | "?" | "{" NUMBER "}" | "{" NUMBER "," NUMBER
"}" |"*?" | "+?" | "{" NUMBER "," NUMBER "}?"
group ::= "(" re ")"
nc-group ::= "(?:" re ")"
char = <any non-special character> | "\" <any character>
... and so on. I didn't include charsets or all the (?...) extensions or
special sequences.
--
Duncan Booth http://kupuguy.blogspot.com
More information about the Python-list
mailing list