Using re to perform grep functionality in Python

Cameron Simpson cs at zip.com.au
Wed Mar 1 17:54:42 EST 2017


On 01Mar2017 13:55, robert at forzasilicon.com <robert at forzasilicon.com> wrote:
>I'm relatively new to Python, and I am having some trouble with one of my 
>scripts. Basically, this script connects to a server via ssh, runs Dell's 
>omreport output, and then externally pipes it to a mail script in cron. The 
>script uses an external call to grep via subprocess, but I would like to 
>internalize everything to run in Python. I've read that re can accomplish the 
>filtering, but I am having trouble getting it to duplicate what I already 
>have.
>
>##### USING EXTERNAL GREP #####
>
># display RAID controller info
>cmd = ['ssh root at server.ip -C \"/opt/dell/srvadmin/sbin/omreport storage 
>vdisk\"']
>
>print '########## Array Status ##########'
>p = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE)

Might I recommend that you drop the shell quotes and invoke ssh directly?

    cmdargs = [ 'ssh', 'root at server.ip', '-C',
                '/opt/dell/srvadmin/sbin/omreport storage vdisk' ]
    p = subprocess.Popen(cmdargs, shell=False, stdout=subprocess.PIPE)

And since shell=False is the default you can leave it off. It is just generally 
better to avoid using the shell for commands as it introduces a layer of 
quoting and escaping, usually pointlessly and always with a risk of misuse.

>if call(["grep", "-i", "ID"], stdin=p.stdout) != 0:

You're just looking for "id" in any case. That's fairly loose. More commentry 
below the output...

[...]
>This gives me:
>ID                                : 0
>Layout                            : RAID-1
>Associated Fluid Cache State      : Not Applicable
>----
>
>vs the full output of:
>
>########## Array Status ##########
>List of Virtual Disks in the System
>
>Controller PERC H310 Mini (Embedded)
>ID                                : 0
>Status                            : Ok
>Name                              : Main
>State                             : Ready
>Hot Spare Policy violated         : Not Assigned
>Encrypted                         : Not Applicable
>Layout                            : RAID-1
>Size                              : 2,794.00 GB (3000034656256 bytes)
>T10 Protection Information Status : No
>Associated Fluid Cache State      : Not Applicable
>Device Name                       : /dev/sda
>Bus Protocol                      : SATA
>Media                             : HDD
>Read Policy                       : No Read Ahead
>Write Policy                      : Write Through
>Cache Policy                      : Not Applicable
>Stripe Element Size               : 64 KB
>Disk Cache Policy                 : Enabled
>
>##### END ####
>
>I've been tinkering around with re, and the only code that I've come up with that doesn't give me a type error is this:

[...]
>for line in p.stdout:
>        final = re.findall('ID', line, re.DOTALL)
>        print final
[...]
>[]
>[]
>[]
>['ID']
[...]

Findall returns a list of all the matches. What you want is "if findall found 
anything, print the whole line", and for that you would only need find() (since 
it is enough to find just one match, not all of them).

Might I suggest you don't use re at all? It has its uses but for your initial 
needs it is not necessary, and regexps are easy to get wrong (not to mention 
slow). Do a fixed string split:

    # trim the trailing newline and other padding if any
    line = line.strip()
    try:
        lhs, rhs = split(line, ':', 1)
    except ValueError:
        # no colon, probably a heading
        ... handle a heading, if you care ...
    else:
        field = lhs.strip()
        value = rhs.strip()
        if field == 'ID':
            ... note the controller id ...
        elif field == 'Layout':
            ... note the layout value ...
        ... whatever other fields you care about ...

You can see that in this way you can handle fields very specificly as you see 
fit.  Some RAID monitoring tools produce quite long reports, so you can also 
track state (what heading/controller/raidset am I in at present?) and build a 
data structure for better reporting.

As an example, have a gander at this:

    https://pypi.python.org/pypi/cs.app.megacli/20160310

which I wrote for IBM MegaRAIDs; it also does Dell PowerEdge (PERC) 
controllers.

It is deliberately Python 2 BTW, because it had to run on RHEL 5 which had 
Python 2.4 as the vendor supplied python.

Cheers,
Cameron Simpson <cs at zip.com.au>

Some people, when confronted with a problem, think "I know, I'll use regular 
expressions. Now they have two problems."
- Jamie Zawinski, in alt.religion.emacs



More information about the Python-list mailing list