Parsing logfile with multi-line loglines, separated by timestamp?

Skip Montanaro skip.montanaro at gmail.com
Tue Jun 30 11:47:16 EDT 2015


Maybe define a class which wraps a file-like object. Its next() method (or
is it __next__() method?) can just buffer up lines starting with one which
successfully parses as a timestamp, accumulates all the rest, until a blank
line or EOF is seen, then return that, either as a list of strings, one
massive string, or some higher level representation (presumably an instance
of another class) which represents one "paragraph" of iostat output.

Skip


On Tue, Jun 30, 2015 at 10:24 AM, Victor Hooi <victorhooi at gmail.com> wrote:

> Hi,
>
> I'm trying to parse iostat -xt output using Python. The quirk with iostat
> is that the output for each second runs over multiple lines. For example:
>
> 06/30/2015 03:09:17 PM
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>            0.03    0.00    0.03    0.00    0.00   99.94
>
> Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz
> avgqu-sz   await r_await w_await  svctm  %util
> xvdap1            0.00     0.04    0.02    0.07     0.30     3.28
> 81.37     0.00   29.83    2.74   38.30   0.47   0.00
> xvdb              0.00     0.00    0.00    0.00     0.00     0.00
> 11.62     0.00    0.23    0.19    2.13   0.16   0.00
> xvdf              0.00     0.00    0.00    0.00     0.00     0.00
> 10.29     0.00    0.41    0.41    0.73   0.38   0.00
> xvdg              0.00     0.00    0.00    0.00     0.00     0.00
>  9.12     0.00    0.36    0.35    1.20   0.34   0.00
> xvdh              0.00     0.00    0.00    0.00     0.00     0.00
> 33.35     0.00    1.39    0.41    8.91   0.39   0.00
> dm-0              0.00     0.00    0.00    0.00     0.00     0.00
> 11.66     0.00    0.46    0.46    0.00   0.37   0.00
>
> 06/30/2015 03:09:18 PM
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>            0.00    0.00    0.50    0.00    0.00   99.50
>
> Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz
> avgqu-sz   await r_await w_await  svctm  %util
> xvdap1            0.00     0.00    0.00    0.00     0.00     0.00
>  0.00     0.00    0.00    0.00    0.00   0.00   0.00
> xvdb              0.00     0.00    0.00    0.00     0.00     0.00
>  0.00     0.00    0.00    0.00    0.00   0.00   0.00
> xvdf              0.00     0.00    0.00    0.00     0.00     0.00
>  0.00     0.00    0.00    0.00    0.00   0.00   0.00
> xvdg              0.00     0.00    0.00    0.00     0.00     0.00
>  0.00     0.00    0.00    0.00    0.00   0.00   0.00
> xvdh              0.00     0.00    0.00    0.00     0.00     0.00
>  0.00     0.00    0.00    0.00    0.00   0.00   0.00
> dm-0              0.00     0.00    0.00    0.00     0.00     0.00
>  0.00     0.00    0.00    0.00    0.00   0.00   0.00
>
> 06/30/2015 03:09:19 PM
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>            0.00    0.00    0.50    0.00    0.00   99.50
>
> Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz
> avgqu-sz   await r_await w_await  svctm  %util
> xvdap1            0.00     0.00    0.00    0.00     0.00     0.00
>  0.00     0.00    0.00    0.00    0.00   0.00   0.00
> xvdb              0.00     0.00    0.00    0.00     0.00     0.00
>  0.00     0.00    0.00    0.00    0.00   0.00   0.00
> xvdf              0.00     0.00    0.00    0.00     0.00     0.00
>  0.00     0.00    0.00    0.00    0.00   0.00   0.00
> xvdg              0.00     0.00    0.00    0.00     0.00     0.00
>  0.00     0.00    0.00    0.00    0.00   0.00   0.00
> xvdh              0.00     0.00    0.00    0.00     0.00     0.00
>  0.00     0.00    0.00    0.00    0.00   0.00   0.00
> dm-0              0.00     0.00    0.00    0.00     0.00     0.00
>  0.00     0.00    0.00    0.00    0.00   0.00   0.00
>
> Essentially I need to parse the output in "chunks", where each chunk is
> separated by a timestamp.
>
> I was looking at itertools.groupby(), but that doesn't seem to quite do
> what I want here - it seems more for grouping lines, where each is united
> by a common key, or something that you can use a function to check for.
>
> Another thought was something like:
>
>     for line in f:
>         if line.count("/") == 2 and line.count(":") == 2:
>             current_time = datetime.strptime(line.strip(), '%m/%d/%y
> %H:%M:%S')
>         while line.count("/") != 2 and line.count(":") != 2:
>             print(line)
>             continue
>
> But that didn't quite seem to work.
>
> Is there a Pythonic way of parsing the above iostat output, and break it
> into chunks split by the timestamp?
>
> Cheers,
> Victor
> --
> https://mail.python.org/mailman/listinfo/python-list
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20150630/35404868/attachment.html>


More information about the Python-list mailing list