Regex to extract multiple fields in the same line

Nathan Hilterbrand nhilterbrand at gmail.com
Fri Jun 15 09:45:43 EDT 2018


On Wed, Jun 13, 2018 at 4:08 AM, Ganesh Pal <ganesh1pal at gmail.com> wrote:

>  Hi Team,
>
> I wanted to parse a file and extract few feilds that are present after "="
> in a text file .
>
>
> Example , form  the below line I need to extract the values present after
> --struct =, --loc=, --size= and --log_file=
>
> Sample input
>
> line = '06/12/2018 11:13:23 AM python toolname.py  --struct=data_block
> --log_file=/var/1000111/test18.log --addr=None --loc=0 --mirror=10
> --path=/tmp/data_block.txt size=8'
>
>
> Expected output
>
> data_block
> /var/1000111/test18.log
> 0
> 8
>
>
> Here is my sample code , its still not complete ,  I wanted to use regex
> and find and extract all the fields after " =", any suggestion or
> alternative way to optimize this further
>
>
> import re
> line = '06/12/2018 11:13:23 AM python toolname.py  --struct=data_block
> --log_file=/var/1000111/test18.log --addr=None --loc=0 --mirror=10
> --path=/tmp/data_block.txt size=8'
>
> r_loc = r"--loc=(\d+)"
> r_struct = r'--struct=(\w+)'
>
> if re.findall(r_loc, line):
>    print re.findall(r_loc, line)
>
> if re.findall(r_struct, line):
>    print re.findall(r_struct, line)
>
>
> root at X1:/# python regex_02.py
> ['0']
> ['data_block']
>
>
> I am a  Linux  user with python 2.7
>
>
> Regards,
> Ganesh
>

---

Ooops...  I didn't notice the 'python 2.7' part until after I had coded up
something.  This solution could probably be squeezed to work with Python 2
somehow, though

I am actually a perl guy, and I love regex... in perl.  I do tend to avoid
it if all possible in Python, though.  I pieced together an ugly function
to do what you want.  It avoids naming the arguments, so if a new argument
is added to your records, it should handle it with no changes.  In python,
I love comprehensions almost as much as I love regex in perl.

Anyway, here is my n00bish stab at it:

#!/usr/bin/python3

line = '06/12/2018 11:13:23 AM python toolname.py  --struct=data_block
--log_file=/var/1000111/test18.log --addr=None --loc=0 --mirror=10
--path=/tmp/data_block.txt size=8'

def get_args(lne):

    first_hyphens = lne.find('--')    # find where the args start
    if first_hyphens == -1:            # Blow up if ill-formed arg
        raise ValueError("Invalid input line")
    prelude = lne[0:first_hyphens]      # Get the stuff before the args
    prelude = prelude.strip()
    args = lne[first_hyphens:]          # Get the arguments
    #  The following line uses comprehensions to build a dictionary of
    #  argument/value pairs.  The arguments portion is split on whitespace,
    #  then each --XX=YY pair is split by '=' into a two element list,  Each
    #  two element list then provides the key and value for each argument.
    #  Note the 'd[0][2:]' strips the '--' from the front of the argument
    #  name
    argdict = {d[0][2:]:d[1] for d in [e.split('=')[0:2] for e in
args.split()]}
    return (prelude, argdict)

if __name__ == "__main__":
    pre, argdict = get_args(line)
    print('Prelude is "{}"'.format(pre))
    for arg in argdict:
        print('  Argument {} = {}'.format(arg, argdict[arg]))



> --
> https://mail.python.org/mailman/listinfo/python-list
>



More information about the Python-list mailing list