Tips to match multiple patterns from from a single file .

Ganesh Pal ganesh1pal at gmail.com
Sun Jul 23 13:21:33 EDT 2017


I have hundreds of file in a directory  from all of which I need to extract
multiple  values namely  filename with pathname (which start with test*),
 1,1,25296896:8192 ( only the one containing pattern corrupting),   before
corruption( it’s a hex value), offset(digit), size(digit)



Sample file contents ( All my files are small files ):



07/22/2017 12:34:28 AM INFO: --offset=18 --mirror=1 --path=/ifs/i/inode.txt
--size=4

07/22/2017 12:34:28 AM INFO:The mirror selected is 1,1,25296896:8192

07/22/2017 12:34:28 AM INFO:Data before corruption : 1b000100

07/22/2017 12:34:28 AM INFO:Corrupting disk object 6 at 1,1,25296896:8192

07/22/2017 12:34:28 AM INFO:Data after corruption : 00000000


I am expecting something like this



# Filename : /var/01010101/test01log    object: 1,1,25296896:8192  checksum
: 1b000100  offset: 18  size:4

# Filename : /var/01010101/test03log    object: 1,2,25296896:8192  checksum
: 1b200120  offset: 8    size:8



Here is how I have started coding this but not sure how to to group
multiple patterns and return it as a function  , I am trying with group()
amd groupdicts()   any tips and better idea



import glob

import re



for filename in sorted(glob.glob('/var/01010101/test*.log')):

    with open(filename, 'r') as f:

        for linenum, line in enumerate(f):

            m = re.search(r'(Corrupting.*)',line)

             if not m:

                # uninteresting line

                continue

            x  = m.group().split()

        print filename , x[-1]





x123-45# python  test.py

/var/01010101/test01_.log 1,1,25296896:8192


I am on Python 2.7 and Linux


Regards,

Ganesh



More information about the Python-list mailing list