Finding messages in huge mboxes

Cameron Laird claird at lairds.com
Tue Feb 3 10:23:42 EST 2004


In article <4f0a9fdb.0402022331.394b3002 at posting.google.com>,
Miki Tebeka <miki.tebeka at zoran.com> wrote:
>Hell Bastiaan, 
>
>> I need find messages in huge mbox files (50MB or more).
>> ...
>> Anyone who has a better idea?
>I find that sometime using the unix little utilties (which are
>available for M$ as well) gives very good performance.
>
>--- last.py ---
>#!/usr/bin/env python
>from os import popen
>from sys import argv
>
># Find last "From:" line
>last = popen("grep -n 'From:' %s | tail -1" % argv[1]).read()
>last = int(last.split(":")[0])
># Find total number of lines
>size = popen("wc -l %s" % argv[1]).read()
>size = int(size.split()[0].strip())
># Print the message
>print popen("tail -%d %s" % (size - last, argv[1])).read()
>--- last.py ---
>Tool less than 1sec on my computer on a 11MB mailbox.
			.
			.
			.
Absolutely.

Miki, I'd find this illustration even more compelling if it exploited
  commands.getoutput(.)
in place of your triplicated
  popen(.).read()
-- 

Cameron Laird <claird at phaseit.net>
Business:  http://www.Phaseit.net



More information about the Python-list mailing list