match.groupdict() into a single dict

Wed Apr 19 15:49:54 EDT 2017

On 2017-04-19 14:26, Ganesh Pal wrote:
> Hello friends,
> 
> I am learning regex and trying to use this to my scripts  I need some
> suggestion on the below code.   I need to match all lines of a file that
> have a specific pattern
> and return them as a dictionary.
> 
> Sample line:
> 
> 'NODE=ADAM-11: | TIME=2017-04-14T05:27:16-07:00 |  COND=Some lovely message
> | MSG=attempt to record { addr=1,0,17080320:8192 action=xxhello-hell
> o owner=1:0070:001a::HEAD }, but history information has a different owner:
> owner: 1:0064:0005::HEAD, actions (new->old): { hello-hello
>   * 1, none, none, hello-hello * 1, none, none, hello-hello * 1, none, none,
> hello-hello * 1, none, none, hello-hello * 1, none, hello-h
> ello * 1, none } bh hello_cookie: 8:hello-only bhv | LINSNAP=None | MAP=none
> 
> 
> 
> with open("/tmp/2.repo","r") as f:
>       for line in f:
>           result = re.search(r'MSG=attempt to record(.*)LINSNAP', line)
>           if result:
>              pdb.set_trace()
>              for pattern in [ r'(?P<Block>(\d+,\d+,\d+:\d+))',
> 
>   r'(?P<p_owner>([0-9a-fA-F]+:[0-9a-fA-F]+:[0-9a-fA-F]+::HEAD))',
> 
>   r'(?P<a_owner>(owner:\s+[0-9a-fA-F]+:[0-9a-fA-F]+:[0-9a-fA-F]+::HEAD))',
>                         ]:
>                  regex = re.compile(pattern)
>                  match = regex.search(line)
>                  print '  ', match.groupdict()
> 
> sample o/p:
> 
>    {'Block': '1,0,17080320:8192'}
>     {'p_owner': '1:0070:001a::HEAD'}
>     {'a_owner': 'owner: 1:0064:0005::HEAD'}
> 
> Questions
> 
> 1. I was expecting a single dictionary with all matches every a line ,
> something like below
> 
>     {'Block': '1,0,17080320:8192', 'p_owner': '1:0070:001a::HEAD','a_owner':
> 'owner: 1:0064:0005::HEAD'}
> 
>     (a)  I am thinking to  add these element  {'Block': '1,0,17080320:8192'}
> , {'p_owner': '1:0070:001a::HEAD'} ... etc to new dictionary
> 
>     (b) or some better regex may be the for loop is not needed  and complied
> pattern can be better.
> 
> 
> I am a Linux user and on Python 2.7 , Thanks in advance :)
> 
Why would you expect a single dictionary? You're doing 3 separate matches!

You could just combine the patterns as alternatives:

# The alternatives are matched repeatedly. The final '.' alternative
# will consume a character if none of the previous subpatterns match,
# ready for the next repeat.
subpatterns = [r'(?P<Block>(\d+,\d+,\d+:\d+))',
     r'(?P<p_owner>([0-9a-fA-F]+:[0-9a-fA-F]+:[0-9a-fA-F]+::HEAD))',

r'(?P<a_owner>(owner:\s+[0-9a-fA-F]+:[0-9a-fA-F]+:[0-9a-fA-F]+::HEAD))',
     '.']
pattern = '(%s)*' % '|'.join(subpatterns)
match = re.search(pattern, line)
print '  ', match.groupdict()