[Tutor] regex grouping/capturing
Albert-Jan Roskam
fomcl at yahoo.com
Fri Jun 14 10:48:13 CEST 2013
----- Original Message -----
> From: Andreas Perstinger <andipersti at gmail.com>
> To: tutor at python.org
> Cc:
> Sent: Thursday, June 13, 2013 8:09 PM
> Subject: Re: [Tutor] regex grouping/capturing
>
> On 13.06.2013 17:09, Albert-Jan Roskam wrote:
>> I have a string of the form "required optional3 optional2 optional1
>> optional3" ('optional' may be any kind of string, so it's
> not simply
>> 'optional\d+'.
>> I would like to use a regex so I can distinguish groups. Desired
>> outcome: ('required', 'optional3', 'optional2',
> 'optional1',
>> 'optional3'). Below is a fragment of the many things I have tried.
> [SNIP]
>> How can I make this work?
>
> If you really want to use a regex:
>>>> import re
>>>> s = "required optional3 optional2 optional1 optional3"
>>>> s2 = "required optional1 optional2 optional3"
>>>> pattern = "required|optional1|optional2|optional3"
>>>> re.findall(pattern, s)
> ['required', 'optional3', 'optional2',
> 'optional1', 'optional3']
>>>> re.findall(pattern, s2)
> ['required', 'optional1', 'optional2',
> 'optional3']
Hi Andreas, thanks for your reply. I am trying to create a pygments regex lexer. It parses code and classfies it (in my case) commands, subcommands and keywords. AFAIK, re.findall can't be used with pygments, but maybe I am mistaken. The quantifier of groups (a plus sign in my case) just works different from what I expect. It seems that only optional (with a "?") groups can be used, not other quantifiers. Here's a simplfied example of the 'set' command that I would like to parse.
>>> s = 'set workspace = 6148 header on.'
>>> r = "(set)\s+(header|workspace)+\s*=?\s*.*\.$"
>>> re.search(r, s, re.I).groups()
[('set', 'workspace')] # desired output: [('set', 'workspace', 'header')]
>>> r = "(set)\s+(?:(header|workspace)\s*=?\s*.*)+\.$"
>>> re.search(r, s, re.I).groups()
('set', 'workspace') # grrr, still no luck
More information about the Tutor
mailing list