Finding messages in huge mboxes

Miklós nospam at nowhere.hu
Mon Feb 2 15:56:39 EST 2004


What about putting it into a database like MySQL? <pyWink>

Miklós


"Bastiaan Welmers" <haasje at welmers.net> wrote in message
news:401eb54c$0$315$e4fe514c at news.xs4all.nl...
> Hi,
>
> I wondered if anyone has ever met this same mbox issue.
>
> I'm having the following problem:
>
> I need find messages in huge mbox files (50MB or more).
> The following way is (of course?) not very usable:
>
> fp = open("mbox", "r")
> archive = mailbox.UnixMailbox(fp)
> i=0
> while i < message_number_needed:
>    i+=1
>    archive.next()
>
> needed_message = archive.next()
>
> Especially because I often need messages at the end
> of the MBOX file.
> So I tried the following (scanning messages backwards
> on found "From " lines with readline())
>
> i=0
> j=0
> while 1:
>   i+=1
>   fp.seek(-i, SEEK_TO_END=2)
>   line = fp.readline()
>   if not line:
>      break
>   if line[:5] == 'From ':
>      j+=1
>      if j == total_messages - message_number_needed:
>         archive.seekp = fp.tell()
>         message = archive.next()
>         # message found
>
> But also seems to be slow and CPU consuming.
>
> Anyone who has a better idea?
>
> Regards,
>
> Bastiaan Welmers





More information about the Python-list mailing list