Trouble with regular expressions

John Machin sjmachin at lexicon.net
Sat Feb 7 15:40:09 EST 2009


On Feb 8, 1:37 am, MRAB <goo... at mrabarnett.plus.com> wrote:
> LaundroMat wrote:
> > Hi,
>
> > I'm quite new to regular expressions, and I wonder if anyone here
> > could help me out.
>
> > I'm looking to split strings that ideally look like this: "Update: New
> > item (Household)" into a group.
> > This expression works ok: '^(Update:)?(.*)(\(.*\))$' - it returns
> > ("Update", "New item", "(Household)")
>
> > Some strings will look like this however: "Update: New item (item)
> > (Household)". The expression above still does its job, as it returns
> > ("Update", "New item (item)", "(Household)").

Not quite true; it actually returns
    ('Update:', ' New item (item) ', '(Household)')
However ignoring the difference in whitespace, the OP's intention is
clear. Yours returns
    ('Update:', ' New item ', '(item) (Household)')


> > It does not work however when there is no text in parentheses (eg
> > "Update: new item"). How can I get the expression to return a tuple
> > such as ("Update:", "new item", None)?
>
> You need to make the last group optional and also make the middle group
> lazy: r'^(Update:)?(.*?)(?:(\(.*\)))?$'.

Why do you perpetuate the redundant ^ anchor?

> (?:...) is the non-capturing version of (...).

Why do you use
    (?:(subpattern))?
instead of just plain
    (subpattern)?
?




More information about the Python-list mailing list