Regex to extract multiple fields in the same line

Friedrich Rentsch anthra.norell at bluewin.ch
Fri Jun 15 10:31:39 EDT 2018



On 06/15/2018 12:37 PM, Ganesh Pal wrote:
> Hey Friedrich,
>
> The proposed solution worked nice , Thank you for  the reply really
> appreciate that
>
>
> Only thing I think would need a review is   if the assignment of the value
> of one dictionary to the another dictionary  if is done correctly ( lines
> 17 to 25 in the below code)
>
>
> Here is my code :
>
> root at X1:/Play_ground/SPECIAL_TYPES/REGEX# vim Friedrich.py
>    1 import re
>    2 from collections import OrderedDict
>    3
>    4 keys = ["struct", "loc", "size", "mirror",
>    5         "filename","final_results"]
>    6
>    7 stats =  OrderedDict.fromkeys(keys)
>    8
>    9
>   10 line = '06/12/2018 11:13:23 AM python toolname.py  --struct=data_block
> --log_file=/var/1000111/test18.log --addr=None --loc=0 --mirror=10
> --path=/tmp/data_block.txt --s    ize=8'
>   11
>   12 regex = re.compile (r"--(struct|loc|size|mirror|
> log_file)\s*=\s*([^\s]+)")
>   13 result = dict(re.findall(regex, line))
>   14 print result
>   15
>   16 if result['log_file']:
>   17    stats['filename'] = result['log_file']
>   18 if result['struct']:
>   19    stats['struct'] = result['struct']
>   20 if result['size']:
>   21    stats['size'] = result['size']
>   22 if result['loc']:
>   23    stats['loc'] = result['loc']
>   24 if result['mirror']:
>   25    stats['mirror'] = result['mirror']
>   26
>   27 print stats
>   28
Looks okay to me. If you'd read 'result' using 'get' you wouldn't need 
to test for the key. 'stats' would then have all keys and value None for 
keys missing in 'result':

stats['filename'] = result.get ('log_file')
stats['struct']   = result.get ('struct')

This may or may not suit your purpose.
>
> Also, I think  the regex can just be
> (r"--(struct|loc|size|mirror|log_file)=([^\s]+)")
> no need to match white space character (\s* )  before and after the =
> symbol because this would never happen ( this line is actually a key=value
> pair of a dictionary getting logged)
>
You are right. I thought your sample line had a space in one of the 
groups and didn't reread to verify, letting the false impression take 
hold. Sorry about that.

Frederic


> Regards,
> Ganesh
>
>
>
>
>
>
> On Fri, Jun 15, 2018 at 12:53 PM, Friedrich Rentsch <
> anthra.norell at bluewin.ch> wrote:
>
>> Hi Ganesch. Having proposed a solution to your problem, it would be kind
>> of you to let me know whether it has helped. In case you missed my
>> response, I repeat it:
>>
>>>>> regex = re.compile (r"--(struct|loc|size|mirror|l
>> og_file)\s*=\s*([^\s]+)")
>>>>> regex.findall (line)
>> [('struct', 'data_block'), ('log_file', '/var/1000111/test18.log'),
>> ('loc', '0'), ('mirror', '10')]
>>
>> Frederic
>>
>>
>> On 06/13/2018 07:32 PM, Ganesh Pal wrote:
>>
>>> On Wed, Jun 13, 2018 at 5:59 PM, Rhodri James <rhodri at kynesim.co.uk>
>>> wrote:
>>>
>>> On 13/06/18 09:08, Ganesh Pal wrote:
>>>>     Hi Team,
>>>>> I wanted to parse a file and extract few feilds that are present after
>>>>> "="
>>>>> in a text file .
>>>>>
>>>>>
>>>>> Example , form  the below line I need to extract the values present
>>>>> after
>>>>> --struct =, --loc=, --size= and --log_file=
>>>>>
>>>>> Sample input
>>>>>
>>>>> line = '06/12/2018 11:13:23 AM python toolname.py  --struct=data_block
>>>>> --log_file=/var/1000111/test18.log --addr=None --loc=0 --mirror=10
>>>>> --path=/tmp/data_block.txt size=8'
>>>>>
>>>>> Did you mean "--size=8" at the end?  That's what your explanation
>>>> implied.
>>>>
>>>
>>>
>>> Yes James you got it right ,  I  meant  "--size=8 " .,
>>>
>>>
>>> Hi Team,
>>>
>>>
>>> I played further with python's re.findall()  and  I am able to extract all
>>> the required  fields , I have 2 further questions too , please suggest
>>>
>>>
>>> Question 1:
>>>
>>>    Please let me know  the mistakes in the below code and  suggest if it
>>> can
>>> be optimized further with better regex
>>>
>>>
>>> # This code has to extract various the fields  from a single line (
>>> assuming the line is matched here ) of a log file that contains various
>>> values (and then store the extracted values in a dictionary )
>>>
>>> import re
>>>
>>> line = '06/12/2018 11:13:23 AM python toolname.py  --struct=data_block
>>> --log_file=/var/1000111/test18.log --addr=None --loc=0 --mirror=10
>>> --path=/tmp/data_block.txt --size=8'
>>>
>>> #loc is an number
>>> r_loc = r"--loc=([0-9]+)"
>>> r_size = r'--size=([0-9]+)'
>>> r_struct = r'--struct=([A-Za-z_]+)'
>>> r_log_file = r'--log_file=([A-Za-z0-9_/.]+)'
>>>
>>>
>>> if re.findall(r_loc, line):
>>>      print re.findall(r_loc, line)
>>>
>>> if re.findall(r_size, line):
>>>      print re.findall(r_size, line)
>>>
>>> if re.findall(r_struct, line):
>>>      print re.findall(r_struct, line)
>>>
>>> if re.findall(r_log_file, line):
>>>      print re.findall(r_log_file, line)
>>>
>>>
>>> o/p:
>>> root at X1:/Play_ground/SPECIAL_TYPES/REGEX# python regex_002.py
>>> ['0']
>>> ['8']
>>> ['data_block']
>>> ['/var/1000111/test18.log']
>>>
>>>
>>> Question 2:
>>>
>>> I  tried to see if I can use  re.search with look behind assertion , it
>>> seems to work , any comments or suggestions
>>>
>>> Example:
>>>
>>> import re
>>>
>>> line = '06/12/2018 11:13:23 AM python toolname.py  --struct=data_block
>>> --log_file=/var/1000111/test18.log --addr=None --loc=0 --mirror=10
>>> --path=/tmp/data_block.txt --size=8'
>>>
>>> match = re.search(r'(?P<loc>(?<=--loc=)([0-9]+))', line)
>>> if match:
>>>      print match.group('loc')
>>>
>>>
>>> o/p: root at X1:/Play_ground/SPECIAL_TYPES/REGEX# python regex_002.py
>>>
>>> 0
>>>
>>>
>>> I  want to build  the sub patterns and use match.group() to get the values
>>> , some thing as show below but it doesn't seem to work
>>>
>>>
>>> match = re.search(r'(?P<loc>(?<=--loc=)([0-9]+))'
>>>                     r'(?P<size>(?<=--size=)([0-9]+))', line)
>>> if match:
>>>      print match.group('loc')
>>>      print match.group('size')
>>>
>>> Regards,
>>> Ganesh
>>>
>>




More information about the Python-list mailing list