efficient 'tail' implementation

Mike Meyer mwm at mired.org
Thu Dec 8 02:09:58 EST 2005


s99999999s2003 at yahoo.com writes:
> I have a file which is very large eg over 200Mb , and i am going to use
> python to code  a "tail"
> command to get the last few lines of the file. What is a good algorithm
> for this type of task in python for very big files?
> Initially, i thought of reading everything into an array from the file
> and just get the last few elements (lines) but since it's a very big
> file, don't think is efficient. 

Well, 200mb isn't all that big these days. But it's easy to code:

# untested code
input = open(filename)
tail = input.readlines()[:tailcount]
input.close()

and you're done. However, it will go through a lot of memory. Fastest
is probably working through it backwards, but that may take multiple
tries to get everything you want:

# untested code
input = open(filename)
blocksize = tailcount * expected_line_length
tail = []
while len(tail) < tailcount:
      input.seek(-blocksize, EOF)
      tail = input.read().split('\n')
      blocksize *= 2
input.close()
tail = tail[:tailcount]

It would probably be more efficient to read blocks backwards and paste
them together, but I'm not going to get into that.

     <mike
-- 
Mike Meyer <mwm at mired.org>			http://www.mired.org/home/mwm/
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.



More information about the Python-list mailing list