Question: Optional Regular Expression Grouping

MRAB python at mrabarnett.plus.com
Mon Oct 10 18:49:13 EDT 2011


On 10/10/2011 22:57, galyle wrote:
> HI, I've looked through this forum, but I haven't been able to find a
> resolution to the problem I'm having (maybe I didn't look hard enough
> -- I have to believe this has come up before).  The problem is this:
> I have a file which has 0, 2, or 3 groups that I'd like to record;
> however, in the case of 3 groups, the third group is correctly
> captured, but the first two groups get collapsed into just one group.
> I'm sure that I'm missing something in the way I've constructed my
> regular expression, but I can't figure out what's wrong.  Does anyone
> have any suggestions?
>
> The demo below showcases the problem I'm having:
>
> import re
>
> valid_line = re.compile('^\[(\S+)\]\[(\S+)\](?:\s+|\[(\S+)\])=|\s+[\d\
> [\']+.*$')
> line1 = "[field1][field2] = blarg"
> line2 = "    'a continuation of blarg'"
> line3 = "[field1][field2][field3] = blorg"
>
> m = valid_line.match(line1)
> print 'Expected: ' + m.group(1) + ', ' + m.group(2)
> m = valid_line.match(line2)
> print 'Expected: ' + str(m.group(1))
> m = valid_line.match(line3)
> print 'Uh-oh: ' + m.group(1) + ', ' + m.group(2)

Instead of "\S" I'd recommend using "[^\]]", or using a lazy repetition
"\S+?".

You'll also need to handle the space before the "=" in line3.

valid_line = 
re.compile(r'^\[(\[^\]]+)\]\[(\[^\]]+)\](?:\s+|\[(\[^\]]+)\])\s*=|\s+[\d\[\']+.*$')



More information about the Python-list mailing list