[Python-es] Lectura de archivos planos tipo logsurfer
César García
celord en gmail.com
Lun Abr 18 18:15:17 CEST 2011
A mi me servios la siguiente lectura:
http://www.dabeaz.com/coroutines/
En especial:
http://www.dabeaz.com/coroutines/follow.py
que es un "tail" a la python...
2011/4/18 Esteban Dauksis <esteban en dauksis.com>:
> Os lo reenvío, espero que ésta vez llegue a la lista.
>
> Saludos
>
>
> Hola, hace algún tiempo me embarqué en algo parecido.... algo que no
> te han comentado de momento es gestionar la rotación/truncado de
> archivos de log... lo mejor que encontré y que no recuerdo de dónde lo
> saqué es lo siguiente:
>
>
> # -*- coding: utf-8 -*-
> """
> Module to allow for reading lines from a continuously-growing file (such as
> a system log). Handles log files that get rotated/trucated out from under
> us. Inspired by the Perl File::Tail module.
>
> Example:
>
> t = filetail.Tail("log.txt")
> while True:
> line = t.nextline()
> # do something with the line
>
> or:
>
> t = filetail.Tail("log.txt")
> for line in t:
> # do something
> pass
>
> """
>
> from os import stat
> from os.path import abspath
> from stat import ST_SIZE
> from time import sleep, time
>
> class Tail(object):
> """The Tail monitor object."""
>
> def __init__(self, path, only_new = False,
> min_sleep = 1,
> sleep_interval = 1,
> max_sleep = 60):
> """Initialize a tail monitor.
> path: filename to open
> only_new: By default, the tail monitor will start reading from
> the beginning of the file when first opened. Set only_new to
> True to have it skip to the end when it first opens, so that
> you only get the new additions that arrive after you start
> monitoring.
> min_sleep: Shortest interval in seconds to sleep when waiting
> for more input to arrive. Defaults to 1.0 second.
> sleep_interval: The tail monitor will dynamically recompute an
> appropriate sleep interval based on a sliding window of data
> arrival rate. You can set sleep_interval here to seed it
> initially if the default of 1.0 second doesn't work for you
> and you don't want to wait for it to converge.
> max_sleep: Maximum interval in seconds to sleep when waiting
> for more input to arrive. Also, if this many seconds have
> elapsed without getting any new data, the tail monitor will
> check to see if the log got truncated (rotated) and will
> quietly reopen itself if this was the case. Defaults to 60.0
> seconds.
> """
>
> # remember path to file in case I need to reopen
> self.path = abspath(path)
> self.f = open(self.path,"r")
> self.min_sleep = min_sleep * 1.0
> self.sleep_interval = sleep_interval * 1.0
> self.max_sleep = max_sleep * 1.0
> if only_new:
> # seek to current end of file
> file_len = stat(path)[ST_SIZE]
> self.f.seek(file_len)
> self.pos = self.f.tell() # where am I in the file?
> self.last_read = time() # when did I last get some data?
> self.queue = [] # queue of lines that are ready
> self.window = [] # sliding window for dynamically
> # adjusting the sleep_interval
>
> def _recompute_rate(self, n, start, stop):
> """Internal function for recomputing the sleep interval. I get
> called with a number of lines that appeared between the start and
> stop times; this will get added to a sliding window, and I will
> recompute the average interarrival rate over the last window.
> """
> self.window.append((n, start, stop))
> purge_idx = -1 # index of the highest old record
> tot_n = 0 # total arrivals in the window
> tot_start = stop # earliest time in the window
> tot_stop = start # latest time in the window
> for i, record in enumerate(self.window):
> (i_n, i_start, i_stop) = record
> if i_stop < start - self.max_sleep:
> # window size is based on self.max_sleep; this record has
> # fallen out of the window
> purge_idx = i
> else:
> tot_n += i_n
> if i_start < tot_start: tot_start = i_start
> if i_stop > tot_stop: tot_stop = i_stop
> if purge_idx >= 0:
> # clean the old records out of the window (slide the window)
> self.window = self.window[purge_idx+1:]
> if tot_n > 0:
> # recompute; stay within bounds
> self.sleep_interval = (tot_stop - tot_start) / tot_n
> if self.sleep_interval > self.max_sleep:
> self.sleep_interval = self.max_sleep
> if self.sleep_interval < self.min_sleep:
> self.sleep_interval = self.min_sleep
>
> def _fill_cache(self):
> """Internal method for grabbing as much data out of the file as is
> available and caching it for future calls to nextline(). Returns
> the number of lines just read.
> """
> old_len = len(self.queue)
> line = self.f.readline()
> while line != "":
> self.queue.append(line)
> line = self.f.readline()
> # how many did we just get?
> num_read = len(self.queue) - old_len
> if num_read > 0:
> self.pos = self.f.tell()
> now = time()
> self._recompute_rate(num_read, self.last_read, now)
> self.last_read = now
> return num_read
>
> def _dequeue(self):
> """Internal method; returns the first available line out of the
> cache, if any."""
> if len(self.queue) > 0:
> line = self.queue[0]
> self.queue = self.queue[1:]
> return line
> else:
> return None
>
> def _reset(self):
> """Internal method; reopen the internal file handle (probably
> because the log file got rotated/truncated)."""
> self.f.close()
> self.f = open(self.path, "r")
> self.pos = self.f.tell()
> self.last_read = time()
>
> def nextline(self):
> """Return the next line from the file. Blocks if there are no lines
> immediately available."""
>
> # see if we have any lines cached from the last file read
> line = self._dequeue()
> if line:
> return line
>
> # ok, we are out of cache; let's get some lines from the file
> if self._fill_cache() > 0:
> # got some
> return self._dequeue()
>
> # hmm, still no input available
> while True:
> sleep(self.sleep_interval)
> if self._fill_cache() > 0:
> return self._dequeue()
> now = time()
> if (now - self.last_read > self.max_sleep):
> # maybe the log got rotated out from under us?
> if stat(self.path)[ST_SIZE] < self.pos:
> # file got truncated and/or re-created
> self._reset()
> if self._fill_cache() > 0:
> return self._dequeue()
>
> def close(self):
> """Close the tail monitor, discarding any remaining input."""
> self.f.close()
> self.f = None
> self.queue = []
> self.window = []
>
> def __iter__(self):
> """Iterator interface, so you can do:
>
> for line in filetail.Tail('log.txt'):
> # do stuff
> pass
> """
> return self
>
> def next(self):
> """Kick the iterator interface. Used under the covers to support:
>
> for line in filetail.Tail('log.txt'):
> # do stuff
> pass
> """
> return self.nextline()
>
>
> Espero que te sea de ayuda. A mi me sirvió y me sigue funcionando :-)
> Saludos
>
> Esteban
>
> El 29/03/2011, a las 01:33, Carlos Herrera Polo escribió:
>
> Gracias a todos por sus aportes, estoy implementando un lector de logs
> utilizando las ideas del link que me sugerio Angel
>
> http://stackoverflow.com/questions/136168/get-last-n-lines-of-a-file-with-python-similar-to-tail/136368#136368
>
> Muchas gracias a todos por sus aportes y ayuda.
>
> El 28 de marzo de 2011 18:30, Chema Cortes <pych3m4 en gmail.com> escribió:
> El día 27 de marzo de 2011 22:33, Carlos Herrera Polo
> <carlos.herrerapolo en gmail.com> escribió:
>
>> Alguien que me pueda sugerir algo por favor ?
>
> Se puede hacer fácilmente, y sin recurrir a herramientas externas,
> manteniendo el fichero abierto para lectura y leyendo de él cuando se
> detecte que ha cambiado su tamaño. Algo parecido al código de este
> mensaje:
>
> http://mail.python.org/pipermail/python-es/2003-May/002091.html
>
> _______________________________________________
> Python-es mailing list
> Python-es en python.org
> http://mail.python.org/mailman/listinfo/python-es
> FAQ: http://python-es-faq.wikidot.com/
>
>
>
> El día 27 de marzo de 2011 22:33, Carlos Herrera Polo
> <carlos.herrerapolo en gmail.com> escribió:
>> Ilustres, quisiera tener su opinion al respecto..
>>
>> Administro ciertos servidores linux, y consolido la informacion de los
>> logs de estos en un unico servidor con el servicio syslog-ng, los logs
>> son archivos planos que se almacenan en formato ASCII, un log por cada
>> servicio/servicio.
>> Quisiera desarrollar un programa simple que me alerte por correo de
>> ciertos eventos que ocurren en los logs.... Existe un programa llamado
>> logsurfer, escrito en C que hace esto...
>> Mi idea es hacer un logsurfer pero en python, pero no se como trabajar
>> la lectura de estos archivos planos que cada uno debe tener varios
>> megas de peso, y son escritos cada segundo, ustedes como lo harian ?
>> Porque hacer fileread por cada uno y luego irme hasta las ultimas
>> filas como que no seria eficiente...
>>
>> Alguien que me pueda sugerir algo por favor ?
>>
>> Muchas gracias a todos
>>
>> Saludos
>>
>> --
>> Enviado desde mi dispositivo móvil
>> _______________________________________________
>> Python-es mailing list
>> Python-es en python.org
>> http://mail.python.org/mailman/listinfo/python-es
>> FAQ: http://python-es-faq.wikidot.com/
>>
> _______________________________________________
> Python-es mailing list
> Python-es en python.org
> http://mail.python.org/mailman/listinfo/python-es
> FAQ: http://python-es-faq.wikidot.com/
>
--
http://celord.blogspot.com/
Más información sobre la lista de distribución Python-es