Question: Optional Regular Expression Grouping

Vlastimil Brom vlastimil.brom at gmail.com
Mon Oct 10 18:59:46 EDT 2011


2011/10/10 galyle <galyle at gmail.com>:
> HI, I've looked through this forum, but I haven't been able to find a
> resolution to the problem I'm having (maybe I didn't look hard enough
> -- I have to believe this has come up before).  The problem is this:
> I have a file which has 0, 2, or 3 groups that I'd like to record;
> however, in the case of 3 groups, the third group is correctly
> captured, but the first two groups get collapsed into just one group.
> I'm sure that I'm missing something in the way I've constructed my
> regular expression, but I can't figure out what's wrong.  Does anyone
> have any suggestions?
>
> The demo below showcases the problem I'm having:
>
> import re
>
> valid_line = re.compile('^\[(\S+)\]\[(\S+)\](?:\s+|\[(\S+)\])=|\s+[\d\
> [\']+.*$')
> line1 = "[field1][field2] = blarg"
> line2 = "    'a continuation of blarg'"
> line3 = "[field1][field2][field3] = blorg"
>
> m = valid_line.match(line1)
> print 'Expected: ' + m.group(1) + ', ' + m.group(2)
> m = valid_line.match(line2)
> print 'Expected: ' + str(m.group(1))
> m = valid_line.match(line3)
> print 'Uh-oh: ' + m.group(1) + ', ' + m.group(2)
> --
> http://mail.python.org/mailman/listinfo/python-list
>

Hi,
I believe, the space before = is causing problems (or the pattern missing it);
you also need non greedy quantifiers +? to match as little as possible
as opposed to the greedy default:

valid_line = re.compile('^\[(\S+?)\]\[(\S+?)\](?:\s+|\[(\S+)\])\s*=|\s+[\d\[\']+.*$')

or you can use word-patterns explicitely excluding the closing ], like:

valid_line = re.compile('^\[([^\]]+)\]\[([^\]]+)\](?:\s+|\[([^\]]+)\])\s*=|\s+[\d\[\']+.*$')

hth
 vbr



More information about the Python-list mailing list