how to read the last line of a huge file???

Kushal Kumaran kushal.kumaran+python at gmail.com
Mon Jan 31 23:58:05 EST 2011


On Tue, Feb 1, 2011 at 9:12 AM, Alan Meyer <ameyer2 at yahoo.com> wrote:
> On 01/26/2011 04:22 PM, MRAB wrote:
>>
>> On 26/01/2011 10:59, Xavier Heruacles wrote:
>>>
>>> I have do some log processing which is usually huge. The length of each
>>> line is variable. How can I get the last line?? Don't tell me to use
>>> readlines or something like linecache...
>>>
>> Seek to somewhere near the end and then read use readlines(). If you
>> get fewer than 2 lines then you can't be sure that you have the entire
>> last line, so seek a little farther from the end and try again.
>
> I think this has got to be the most efficient solution.
>
> You might get the source code for the open source UNIX utility "tail" and
> see how they do it.  It seems to work with equal speed no matter how large
> the file is and I suspect it uses MRAB's solution, but because it's written
> in C, it probably examines each character directly rather than calling a
> library routine like readlines.
>

How about mmapping the file and using rfind?

def mapper(filename):
    with open(filename) as f:
        mapping = mmap.mmap(f.fileno(), 0, prot=mmap.PROT_READ)
        endIdx = mapping.rfind('\n')
        startIdx = mapping.rfind('\n', 0, endIdx)
        return mapping[startIdx + 1:endIdx]

def seeker(filename):
    offset = -10
    with open(filename, 'rb') as f:
        while True:
            f.seek(offset, os.SEEK_END)
            lines = f.readlines()
            if len(lines) >= 2:
                return lines[-1][:-1]
            offset *= 2

In [1]: import timeit

In [2]: timeit.timeit('finders.seeker("the-file")', 'import finders')
Out[2]: 32.216405868530273

In [3]: timeit.timeit('finders.mapper("the-file")', 'import finders')
Out[3]: 16.805877208709717

the-file is a 120M file with ~500k lines.  Both functions assume the
last line has a trailing newline.  It's easy to correct if that's not
the case.  I think mmap works similarly on Windows, but I've never
tried there.

-- 
regards,
kushal



More information about the Python-list mailing list