MemoryError on reading mbox file

Istvan Albert istvan.albert at gmail.com
Wed Sep 12 10:39:46 EDT 2007


On Sep 12, 5:27 am, Christoph Krammer <redtige... at googlemail.com>
wrote:

>     string = self._file.read(stop - self._file.tell())
> MemoryError

This line reads an entire message into memory as a string. Is it
possible that you have a huge email in there (hundreds of MB) with
some attachment encoded as text?

Either way, the truth is that many modules in the standard library are
not well equipped to deal with large amounts of data. Many of them
were developed before gigabyte sized files were even possible to store
let alone process. Hopefully P3K will alleviate many of these problems
by its extensive use of generators.

For now I would recommend that you split your mbox file into several
smaller ones. (I think all you need is to split at the To: fields) and
run your script on these individual files.

i.











More information about the Python-list mailing list