[Tutor] regex grouping/capturing

Fri Jun 14 10:48:13 CEST 2013

----- Original Message -----
> From: Andreas Perstinger <andipersti at gmail.com>
> To: tutor at python.org
> Cc: 
> Sent: Thursday, June 13, 2013 8:09 PM
> Subject: Re: [Tutor] regex grouping/capturing
> 
> On 13.06.2013 17:09, Albert-Jan Roskam wrote:
>> I have a string of the form "required optional3 optional2 optional1
>> optional3" ('optional' may be any kind of string, so it's 
> not simply
>> 'optional\d+'.
>> I would like to use a regex so I can distinguish groups. Desired
>> outcome: ('required', 'optional3', 'optional2', 
> 'optional1',
>> 'optional3'). Below is  a fragment of the many things I have tried.
> [SNIP]
>> How can I make this work?
> 
> If you really want to use a regex:
>>>> import re
>>>> s = "required optional3 optional2 optional1 optional3"
>>>> s2 = "required optional1 optional2 optional3"
>>>> pattern = "required|optional1|optional2|optional3"
>>>> re.findall(pattern, s)
> ['required', 'optional3', 'optional2', 
> 'optional1', 'optional3']
>>>> re.findall(pattern, s2)
> ['required', 'optional1', 'optional2', 
> 'optional3']

Hi Andreas, thanks for your reply. I am trying to create a pygments regex lexer. It parses code and classfies it (in my case) commands, subcommands and keywords. AFAIK, re.findall can't be used with pygments, but maybe I am mistaken. The quantifier of groups (a plus sign in my case) just works different from what I expect. It seems that only optional (with a "?") groups can be used, not other quantifiers. Here's a simplfied example of the 'set' command that I would like to parse.

>>> s = 'set workspace = 6148 header on.'
>>> r = "(set)\s+(header|workspace)+\s*=?\s*.*\.$"
>>> re.search(r, s, re.I).groups()
[('set', 'workspace')]  # desired output: [('set', 'workspace', 'header')]
>>> r = "(set)\s+(?:(header|workspace)\s*=?\s*.*)+\.$"
>>> re.search(r, s, re.I).groups()
('set', 'workspace')  # grrr, still no luck