How to read such file and sumarize the data?

Wed Nov 17 18:38:58 EST 2010

On Wed, 17 Nov 2010 13:45:58 -0800, huisky wrote:

> Say I have following log file, which records the code usage. I want to
> read this file and do the summarize how much total CPU time consumed for
> each user.
>
Two points you should think about:

- I don't think you can extract CPU time from this log: you can get
  the process elapsed time and the number of CPUs each run has used,
  but you can't calculate CPU time from those values since you don't
  know how the process spent waiting for i/o etc.

- is the first (numeric) part of the first field on the line a process id?
  If it is, you can match start and stop messages on the value of the
  first field provided that this value can never be shared by two
  processes that are both running. If you can get simultaneous
  duplicates, then you're out of luck because you'll never be able to 
  match up start and stop lines.


> Is Python able to do so or say easy to achieve this?, anybody can give
> me some hints, appricate very much!
>
Sure. There are two approaches possible:
- sort the log on the first two fields and then process it with Python
  knowing that start and stop lines will be adjacent

- use the first field as the key to an array and put the start time
  and CPU count in that element. When a matching stop line is found 
  you, retrieve the array element, calculate and output or total the
  usage figure for that run and delete the array element.


-- 
martin@   | Martin Gregorie
gregorie. | Essex, UK
org       |