convert script awk in python

Peter Otten __peter__ at web.de
Thu Mar 25 04:51:10 EDT 2021


On 25/03/2021 08:14, Loris Bennett wrote:

> I'm not doing that, but I am trying to replace a longish bash pipeline
> with Python code.
> 
> Within Emacs, often I use Org mode[1] to generate date via some bash
> commands and then visualise the data via Python.  Thus, in a single Org
> file I run
> 
>    /usr/bin/sacct  -u $user -o jobid -X -S $start -E $end -s COMPLETED -n  | \
>    xargs -I {} seff {} | grep 'Efficiency' | sed '$!N;s/\n/ /' | awk '{print $3 " " $9}' | sed 's/%//g'
>     
> The raw numbers are formatted by Org into a table
> 
>    | cpu_eff | mem_eff |
>    |---------+---------|
>    |    96.6 |   99.11 |
>    |   93.43 |   100.0 |
>    |    91.3 |   100.0 |
>    |   88.71 |   100.0 |
>    |   89.79 |   100.0 |
>    |   84.59 |   100.0 |
>    |   83.42 |   100.0 |
>    |   86.09 |   100.0 |
>    |   92.31 |   100.0 |
>    |   90.05 |   100.0 |
>    |   81.98 |   100.0 |
>    |   90.76 |   100.0 |
>    |   75.36 |   64.03 |
> 
> I then read this into some Python code in the Org file and do something like
> 
>    df = pd.DataFrame(eff_tab[1:], columns=eff_tab[0])
>    cpu_data = df.loc[: , "cpu_eff"]
>    mem_data = df.loc[: , "mem_eff"]
> 
>    ...
> 
>    n, bins, patches = axis[0].hist(cpu_data, bins=range(0, 110, 5))
>    n, bins, patches = axis[1].hist(mem_data, bins=range(0, 110, 5))
> 
> which generates nice histograms.
> 
> I decided rewrite the whole thing as a stand-alone Python program so
> that I can run it as a cron job.  However, as a novice Python programmer
> I am finding translating the bash part slightly clunky.  I am in the
> middle of doing this and started with the following:
> 
>          sacct = subprocess.Popen(["/usr/bin/sacct",
>                                    "-u", user,
>                                    "-S", period[0], "-E", period[1],
>                                    "-o", "jobid", "-X",
>                                    "-s", "COMPLETED", "-n"],
>                                   stdout=subprocess.PIPE,
>          )
> 
>          jobids = []
> 
>          for line in sacct.stdout:
>              jobid = str(line.strip(), 'UTF-8')
>              jobids.append(jobid)
> 
>          for jobid in jobids:
>              seff = subprocess.Popen(["/usr/bin/seff", jobid],
>                                      stdin=sacct.stdout,
>                                      stdout=subprocess.PIPE,
>              )

The statement above looks odd. If seff can read the jobids from stdin 
there should be no need to pass them individually, like:

sacct = ...
seff = Popen(
   ["/usr/bin/seff"], stdin=sacct.stdout, stdout=subprocess.PIPE,
   universal_newlines=True
)
for line in seff.communicate()[0].splitlines():
     ...


>              seff_output = []
>              for line in seff.stdout:
>                  seff_output.append(str(line.strip(), "UTF-8"))
> 
>              ...
> 
> but compared the to the bash pipeline, this all seems a bit laboured.
> 
> Does any one have a better approach?
> 
> Cheers,
> 
> Loris
> 
> 
>> -----Original Message-----
>> From: Cameron Simpson <cs at cskk.id.au>
>> Sent: Wednesday, March 24, 2021 6:34 PM
>> To: Avi Gross <avigross at verizon.net>
>> Cc: python-list at python.org
>> Subject: Re: convert script awk in python
>>
>> On 24Mar2021 12:00, Avi Gross <avigross at verizon.net> wrote:
>>> But I wonder how much languages like AWK are still used to make new
>>> programs as compared to a time they were really useful.
>>
>> You mentioned in an adjacent post that you've not used AWK since 2000.
>> By contrast, I still use it regularly.
>>
>> It's great for proof of concept at the command line or in small scripts, and
>> as the innards of quite useful scripts. I've a trite "colsum" script which
>> does nothing but generate and run a little awk programme to sum a column,
>> and routinely type "blah .... | colsum 2" or the like to get a tally.
>>
>> I totally agree that once you're processing a lot of data from places or
>> where a shell script is making long pipelines or many command invocations,
>> if that's a performance issue it is time to recode.
>>
>> Cheers,
>> Cameron Simpson <cs at cskk.id.au>
> 
> Footnotes:
> [1]  https://orgmode.org/
> 




More information about the Python-list mailing list