Regex to extract multiple fields in the same line
Ganesh Pal
ganesh1pal at gmail.com
Wed Jun 13 13:32:35 EDT 2018
On Wed, Jun 13, 2018 at 5:59 PM, Rhodri James <rhodri at kynesim.co.uk> wrote:
> On 13/06/18 09:08, Ganesh Pal wrote:
>
>> Hi Team,
>>
>> I wanted to parse a file and extract few feilds that are present after "="
>> in a text file .
>>
>>
>> Example , form the below line I need to extract the values present after
>> --struct =, --loc=, --size= and --log_file=
>>
>> Sample input
>>
>> line = '06/12/2018 11:13:23 AM python toolname.py --struct=data_block
>> --log_file=/var/1000111/test18.log --addr=None --loc=0 --mirror=10
>> --path=/tmp/data_block.txt size=8'
>>
>
> Did you mean "--size=8" at the end? That's what your explanation implied.
Yes James you got it right , I meant "--size=8 " .,
Hi Team,
I played further with python's re.findall() and I am able to extract all
the required fields , I have 2 further questions too , please suggest
Question 1:
Please let me know the mistakes in the below code and suggest if it can
be optimized further with better regex
# This code has to extract various the fields from a single line (
assuming the line is matched here ) of a log file that contains various
values (and then store the extracted values in a dictionary )
import re
line = '06/12/2018 11:13:23 AM python toolname.py --struct=data_block
--log_file=/var/1000111/test18.log --addr=None --loc=0 --mirror=10
--path=/tmp/data_block.txt --size=8'
#loc is an number
r_loc = r"--loc=([0-9]+)"
r_size = r'--size=([0-9]+)'
r_struct = r'--struct=([A-Za-z_]+)'
r_log_file = r'--log_file=([A-Za-z0-9_/.]+)'
if re.findall(r_loc, line):
print re.findall(r_loc, line)
if re.findall(r_size, line):
print re.findall(r_size, line)
if re.findall(r_struct, line):
print re.findall(r_struct, line)
if re.findall(r_log_file, line):
print re.findall(r_log_file, line)
o/p:
root at X1:/Play_ground/SPECIAL_TYPES/REGEX# python regex_002.py
['0']
['8']
['data_block']
['/var/1000111/test18.log']
Question 2:
I tried to see if I can use re.search with look behind assertion , it
seems to work , any comments or suggestions
Example:
import re
line = '06/12/2018 11:13:23 AM python toolname.py --struct=data_block
--log_file=/var/1000111/test18.log --addr=None --loc=0 --mirror=10
--path=/tmp/data_block.txt --size=8'
match = re.search(r'(?P<loc>(?<=--loc=)([0-9]+))', line)
if match:
print match.group('loc')
o/p: root at X1:/Play_ground/SPECIAL_TYPES/REGEX# python regex_002.py
0
I want to build the sub patterns and use match.group() to get the values
, some thing as show below but it doesn't seem to work
match = re.search(r'(?P<loc>(?<=--loc=)([0-9]+))'
r'(?P<size>(?<=--size=)([0-9]+))', line)
if match:
print match.group('loc')
print match.group('size')
Regards,
Ganesh
More information about the Python-list
mailing list