Question: Optional Regular Expression Grouping

galyle galyle at gmail.com
Mon Oct 10 19:24:13 EDT 2011


On Oct 10, 4:59 pm, Vlastimil Brom <vlastimil.b... at gmail.com> wrote:
> 2011/10/10 galyle <gal... at gmail.com>:
>
>
>
>
>
>
>
>
>
> > HI, I've looked through this forum, but I haven't been able to find a
> > resolution to the problem I'm having (maybe I didn't look hard enough
> > -- I have to believe this has come up before).  The problem is this:
> > I have a file which has 0, 2, or 3 groups that I'd like to record;
> > however, in the case of 3 groups, the third group is correctly
> > captured, but the first two groups get collapsed into just one group.
> > I'm sure that I'm missing something in the way I've constructed my
> > regular expression, but I can't figure out what's wrong.  Does anyone
> > have any suggestions?
>
> > The demo below showcases the problem I'm having:
>
> > import re
>
> > valid_line = re.compile('^\[(\S+)\]\[(\S+)\](?:\s+|\[(\S+)\])=|\s+[\d\
> > [\']+.*$')
> > line1 = "[field1][field2] = blarg"
> > line2 = "    'a continuation of blarg'"
> > line3 = "[field1][field2][field3] = blorg"
>
> > m = valid_line.match(line1)
> > print 'Expected: ' + m.group(1) + ', ' + m.group(2)
> > m = valid_line.match(line2)
> > print 'Expected: ' + str(m.group(1))
> > m = valid_line.match(line3)
> > print 'Uh-oh: ' + m.group(1) + ', ' + m.group(2)
> > --
> >http://mail.python.org/mailman/listinfo/python-list
>
> Hi,
> I believe, the space before = is causing problems (or the pattern missing it);
> you also need non greedy quantifiers +? to match as little as possible
> as opposed to the greedy default:
>
> valid_line = re.compile('^\[(\S+?)\]\[(\S+?)\](?:\s+|\[(\S+)\])\s*=|\s+[\d\[\']+.*$')
>
> or you can use word-patterns explicitely excluding the closing ], like:
>
> valid_line = re.compile('^\[([^\]]+)\]\[([^\]]+)\](?:\s+|\[([^\]]+)\])\s*=|\s+[\d\[\']+. *$')
>
> hth
>  vbr

Thanks, I had a feeling that greedy matching in my expression was
causing problem.  Your suggestion makes sense to me, and works quite
well.



More information about the Python-list mailing list