efficient 'tail' implementation
Mike Meyer
mwm at mired.org
Thu Dec 8 02:09:58 EST 2005
s99999999s2003 at yahoo.com writes:
> I have a file which is very large eg over 200Mb , and i am going to use
> python to code a "tail"
> command to get the last few lines of the file. What is a good algorithm
> for this type of task in python for very big files?
> Initially, i thought of reading everything into an array from the file
> and just get the last few elements (lines) but since it's a very big
> file, don't think is efficient.
Well, 200mb isn't all that big these days. But it's easy to code:
# untested code
input = open(filename)
tail = input.readlines()[:tailcount]
input.close()
and you're done. However, it will go through a lot of memory. Fastest
is probably working through it backwards, but that may take multiple
tries to get everything you want:
# untested code
input = open(filename)
blocksize = tailcount * expected_line_length
tail = []
while len(tail) < tailcount:
input.seek(-blocksize, EOF)
tail = input.read().split('\n')
blocksize *= 2
input.close()
tail = tail[:tailcount]
It would probably be more efficient to read blocks backwards and paste
them together, but I'm not going to get into that.
<mike
--
Mike Meyer <mwm at mired.org> http://www.mired.org/home/mwm/
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.
More information about the Python-list
mailing list