Having trouble with tail -f standard input

Fri Aug 22 18:53:49 EDT 2008

Derek Martin wrote:
> On Thu, Aug 21, 2008 at 02:58:24PM -0700, sab wrote:
>> I have been working on a python script to parse a continuously growing
>> log file on a UNIX server.
> 
> If you weren't aware, there are already a plethora of tools which do
> this...  You might save yourself the trouble by just using one of
> those.  Try searching for something like "parse log file" on google or
> freshmeat.net or whatever...
> 
>> The input is the standard in, piped in from the log file.  The
>> application works well for the most part, but the problem is when
>> attempting to continuously pipe information into the application via
>> the tail -f command.  The command line looks something like this:
>>
>> tail -f <logfile> | grep <search string> | python parse.py
> 
> The pipe puts STDIN/STDOUT into "fully buffered" mode, which results
> in the behavior you're seeing.  You can set the buffering mode of
> those files in your program, but unfortunately tail and grep are not
> your program...  You might get this to work by setting stdin to
> non-blocking I/O in your Python program, but I don't think it will be
> that easy...
> 
> You can get around this in a couple of ways.  One is to call tail and
> grep from within your program, using something like os.popen()...
> Then set the blocking mode on the resulting files.  You'll have to
> feed the output of one to the input of the other, then read the output
> of grep and parse that.  Yucky.  That method isn't very efficient,
> since Python can do everything that tail and grep are doing for you...
> So I'd suggest you read the file directly in your python program, and
> use Python's regex parsing functionality to do what you're doing with
> grep.  
> 
> As for how to actually do what tail does, I'd suggest looking at the
> source code for tail to see how it does what it does.
>  
> But, if I were you, I'd just download something like swatch, and be
> done with it. :)
> 
> 
> 
> ------------------------------------------------------------------------
> 
> --
> http://mail.python.org/mailman/listinfo/python-list

================================
I have to agree with Derek about using Python as the control here. Pipe 
or otherwise redirect incoming data to Python. If the incoming is 
buffered then the program terminates only by force. (Deleted from memory 
or system shutdown or crash)

The python:  print >>file, str           see Python's lib.pdf
acts like    incoming | tee -a file      in the sense of double output. 
One to a file and one to standard out.   Str can be a .read() on stdin. 
As long as it is a string it don't care how it got there.

Depending on choice (per Unix):
incoming | tee -a logfile | program.py
incoming | program.py (copy all to (log)file) | programsub1.py
   with all parsing in the .py's

The advantage is python can control keeping the buffers and thus the 
programs open and running, whether or not data is in the pipe at the 
moment. This way the logfile gets a full data set and is not further 
disturbed. No trying to determine where last record read is located.
                     OR
Last time I looked, the syslog section was NOT disallowed the use of 
named pipes (which default to first in, first out (FIFO)).
This allows    pgm.py to read named_pipe, append all read to log and 
parse each line as desired, sleep for a time when empty and go again. 
Once more, sequence maintained. No digging to find last tested input.

Hope this helps.

Steve
norseman at hughes.net