CPU usage while reading a named pipe

Sun Sep 13 04:29:55 EDT 2009

Miguel P <prosper.spurius at gmail.com> wrote:
>  On Sep 12, 2:54 pm, Ned Deily <n... at acm.org> wrote:
> > In article
> > <da2362e0-ec68-467b-b50b-6067057d7... at y36g2000yqh.googlegroups.com>,
> >  Miguel P <prosper.spur... at gmail.com> wrote:
> > > I've been working on parsing (tailing) a named pipe which is the
> > > syslog output of the traffic for a rather busy haproxy instance. It's
> > > a fair bit of traffic (upto 3k hits/s per server), but I am finding
> > > that simply tailing the file  in python, without any processing, is
> > > taking up 15% of a CPU core. In contrast HAProxy takes 25% and syslogd
> > > takes 5% with the same load. `cat < /named.pipe` takes 0-2%
> >
> > > Am I just doing things horribly wrong or is this normal?
> >
> > > Here is my code:
> >
> > > from collections import deque
> > > import io, sys
> >
> > > WATCHED_PIPE = '/var/log/haproxy.pipe'
> >
> > > if __name__ == '__main__':
> > >     try:
> > >         log_pool = deque([],10000)
> > >         fd = io.open(WATCHED_PIPE)
> > >         for line in fd:
> > >             log_pool.append(line)
> > >     except KeyboardInterrupt:
> > >         sys.exit()
> >
> > > Deque appends are O(1) so that's not it. And I am using 2.6's io
> > > module because it's supposed to handle named pipes better. I have
> > > commented the deque appending line and it still takes about the same
> > > CPU.
> >
> > Be aware that the io module in Python 2.6 is written in Python and was
> > viewed as a prototype.  In the current svn trunk, what will be Python
> > 2.7 has a much faster C implementation of the io module backported from
> > Python 3.1.
> 
>  Aha, I will test with trunk and see if the performance is better, if
>  so I'll use 2.6 in production until 2.7 comes out. I will report back
>  when I have made the tests.

Why don't you try just using the builtin open() with bufsize
parameter set big?

Something like this (tested with named pipes).  Tweak BUFFERSIZE and
SLEEP_INTERVAL for maximum performance!

import time

BUFFERSIZE = 1024*1024
SLEEP_INTERVAL = 0.1

def tail(path):
    fd = open(path)
    buf =  ""
    while True:
        buf += fd.read(BUFFERSIZE)
        if buf:
            lines = buf.splitlines(True)
            for line in lines[:-1]:
                yield line
            buf = lines[-1]
            if buf.endswith("\n"):
                yield buf
                buf = ""
        else:
            time.sleep(SLEEP_INTERVAL)

def main(path):
    for line in tail(path):
        print "%r:%r" % (len(line), line)

if __name__ == "__main__":
    import sys
    main(sys.argv[1])

-- 
Nick Craig-Wood <nick at craig-wood.com> -- http://www.craig-wood.com/nick