convert script awk in python

Thu Mar 25 03:14:33 EDT 2021

"Avi Gross" <avigross at verizon.net> writes:

> Just to be clear, Cameron, I retired very early and thus have had no reason
> to use AWK in a work situation and for a while was not using UNIX-based
> machines. I have no doubt I would have continued using WK as one part of my
> toolkit for years albeit less often as I found other tools better for some
> situations, let alone the kind I mentioned earlier that are not text-file
> based such as databases.
>
> It is, as noted, a great tool and if you only had one or a few tools like it
> available, it can easily be bent and twisted to do much of what the others
> do as it is more programmable than most. But following that line of
> reasoning, fairly simple python scripts can be written with python -c "..."
> or by pointing to a script
>
> Anyone have a collection of shell scripts that can be used in pipelines
> where each piece is just a call to python to do something simple?

I'm not doing that, but I am trying to replace a longish bash pipeline
with Python code.

Within Emacs, often I use Org mode[1] to generate date via some bash
commands and then visualise the data via Python.  Thus, in a single Org
file I run

  /usr/bin/sacct  -u $user -o jobid -X -S $start -E $end -s COMPLETED -n  | \                                                                                                                                   
  xargs -I {} seff {} | grep 'Efficiency' | sed '$!N;s/\n/ /' | awk '{print $3 " " $9}' | sed 's/%//g'                                                                                                          

The raw numbers are formatted by Org into a table

  | cpu_eff | mem_eff |
  |---------+---------|
  |    96.6 |   99.11 |
  |   93.43 |   100.0 |
  |    91.3 |   100.0 |
  |   88.71 |   100.0 |
  |   89.79 |   100.0 |
  |   84.59 |   100.0 |
  |   83.42 |   100.0 |
  |   86.09 |   100.0 |
  |   92.31 |   100.0 |
  |   90.05 |   100.0 |
  |   81.98 |   100.0 |
  |   90.76 |   100.0 |
  |   75.36 |   64.03 |

I then read this into some Python code in the Org file and do something like

  df = pd.DataFrame(eff_tab[1:], columns=eff_tab[0])                    
  cpu_data = df.loc[: , "cpu_eff"]                                                                                                                                                                               
  mem_data = df.loc[: , "mem_eff"]                                                                                                                                                                              

  ...

  n, bins, patches = axis[0].hist(cpu_data, bins=range(0, 110, 5))                                                                                                                                               
  n, bins, patches = axis[1].hist(mem_data, bins=range(0, 110, 5)) 

which generates nice histograms.

I decided rewrite the whole thing as a stand-alone Python program so
that I can run it as a cron job.  However, as a novice Python programmer
I am finding translating the bash part slightly clunky.  I am in the
middle of doing this and started with the following:

        sacct = subprocess.Popen(["/usr/bin/sacct",
                                  "-u", user,
                                  "-S", period[0], "-E", period[1],
                                  "-o", "jobid", "-X",
                                  "-s", "COMPLETED", "-n"],
                                 stdout=subprocess.PIPE,
        )

        jobids = []

        for line in sacct.stdout:
            jobid = str(line.strip(), 'UTF-8')
            jobids.append(jobid)

        for jobid in jobids:
            seff = subprocess.Popen(["/usr/bin/seff", jobid],
                                    stdin=sacct.stdout,
                                    stdout=subprocess.PIPE,
            )
            seff_output = []
            for line in seff.stdout:
                seff_output.append(str(line.strip(), "UTF-8"))

            ...        

but compared the to the bash pipeline, this all seems a bit laboured. 

Does any one have a better approach?

Cheers,

Loris

> -----Original Message-----
> From: Cameron Simpson <cs at cskk.id.au> 
> Sent: Wednesday, March 24, 2021 6:34 PM
> To: Avi Gross <avigross at verizon.net>
> Cc: python-list at python.org
> Subject: Re: convert script awk in python
>
> On 24Mar2021 12:00, Avi Gross <avigross at verizon.net> wrote:
>>But I wonder how much languages like AWK are still used to make new 
>>programs as compared to a time they were really useful.
>
> You mentioned in an adjacent post that you've not used AWK since 2000.  
> By contrast, I still use it regularly.
>
> It's great for proof of concept at the command line or in small scripts, and
> as the innards of quite useful scripts. I've a trite "colsum" script which
> does nothing but generate and run a little awk programme to sum a column,
> and routinely type "blah .... | colsum 2" or the like to get a tally.
>
> I totally agree that once you're processing a lot of data from places or
> where a shell script is making long pipelines or many command invocations,
> if that's a performance issue it is time to recode.
>
> Cheers,
> Cameron Simpson <cs at cskk.id.au>

Footnotes: 
[1]  https://orgmode.org/

-- 
This signature is currently under construction.