Regex to extract multiple fields in the same line
Nathan Hilterbrand
nhilterbrand at gmail.com
Fri Jun 15 09:45:43 EDT 2018
On Wed, Jun 13, 2018 at 4:08 AM, Ganesh Pal <ganesh1pal at gmail.com> wrote:
> Hi Team,
>
> I wanted to parse a file and extract few feilds that are present after "="
> in a text file .
>
>
> Example , form the below line I need to extract the values present after
> --struct =, --loc=, --size= and --log_file=
>
> Sample input
>
> line = '06/12/2018 11:13:23 AM python toolname.py --struct=data_block
> --log_file=/var/1000111/test18.log --addr=None --loc=0 --mirror=10
> --path=/tmp/data_block.txt size=8'
>
>
> Expected output
>
> data_block
> /var/1000111/test18.log
> 0
> 8
>
>
> Here is my sample code , its still not complete , I wanted to use regex
> and find and extract all the fields after " =", any suggestion or
> alternative way to optimize this further
>
>
> import re
> line = '06/12/2018 11:13:23 AM python toolname.py --struct=data_block
> --log_file=/var/1000111/test18.log --addr=None --loc=0 --mirror=10
> --path=/tmp/data_block.txt size=8'
>
> r_loc = r"--loc=(\d+)"
> r_struct = r'--struct=(\w+)'
>
> if re.findall(r_loc, line):
> print re.findall(r_loc, line)
>
> if re.findall(r_struct, line):
> print re.findall(r_struct, line)
>
>
> root at X1:/# python regex_02.py
> ['0']
> ['data_block']
>
>
> I am a Linux user with python 2.7
>
>
> Regards,
> Ganesh
>
---
Ooops... I didn't notice the 'python 2.7' part until after I had coded up
something. This solution could probably be squeezed to work with Python 2
somehow, though
I am actually a perl guy, and I love regex... in perl. I do tend to avoid
it if all possible in Python, though. I pieced together an ugly function
to do what you want. It avoids naming the arguments, so if a new argument
is added to your records, it should handle it with no changes. In python,
I love comprehensions almost as much as I love regex in perl.
Anyway, here is my n00bish stab at it:
#!/usr/bin/python3
line = '06/12/2018 11:13:23 AM python toolname.py --struct=data_block
--log_file=/var/1000111/test18.log --addr=None --loc=0 --mirror=10
--path=/tmp/data_block.txt size=8'
def get_args(lne):
first_hyphens = lne.find('--') # find where the args start
if first_hyphens == -1: # Blow up if ill-formed arg
raise ValueError("Invalid input line")
prelude = lne[0:first_hyphens] # Get the stuff before the args
prelude = prelude.strip()
args = lne[first_hyphens:] # Get the arguments
# The following line uses comprehensions to build a dictionary of
# argument/value pairs. The arguments portion is split on whitespace,
# then each --XX=YY pair is split by '=' into a two element list, Each
# two element list then provides the key and value for each argument.
# Note the 'd[0][2:]' strips the '--' from the front of the argument
# name
argdict = {d[0][2:]:d[1] for d in [e.split('=')[0:2] for e in
args.split()]}
return (prelude, argdict)
if __name__ == "__main__":
pre, argdict = get_args(line)
print('Prelude is "{}"'.format(pre))
for arg in argdict:
print(' Argument {} = {}'.format(arg, argdict[arg]))
> --
> https://mail.python.org/mailman/listinfo/python-list
>
More information about the Python-list
mailing list