[Email-SIG] Design Thoughts Summary
Eric S. Johansson
esj at harvee.org
Mon Jan 4 03:25:56 CET 2010
On 11/15/2009 1:01 PM, Barry Warsaw wrote:
> On Nov 14, 2009, at 5:12 PM, Matthew Dixon Cowles wrote:
>
>> Thank you. I am virtually 100% in agreement that this document
>> represents what people have agreed on and that it represents what is
>> sensible to do.
>
> As am I. Fantastic work in pulling this all together David.
>
> I'm a bit slammed right now, but a quick comment...
>
>>> * The API needs to at a minimum have hooks available for an
>>> application to store data on disk rather than holding everything in
>>> memory.
>>
>> I remain unconvinced that this is worth the trouble. Yes, the Twisted
>> folks say that they can't use the email module because they may be
>> receiving hundreds of messages at once. But can anyone do anything
>> with hundreds of messages at once other than write them to disk?
>>
>> And would anything actually be improved by reading hundreds of files
>> at once, in small chunks, looking for MIME separators?
>
> Mailman has a similar problem. Even if we get just a few big messages,
> they can crush the system. You could argue that the MTA should just
> block messages with 50MB bodies if the underlying Mailman code can't
> handle it, but I still think we can do better.
>
> I think we're fine if all the headers and MIME structure were kept in
> memory it would be fine. But I do think we just want to be able to never
> store the raw body content in memory (perhaps unless needed, on demand).
> Mailman for example rarely cares about the bytes of say an image/jpeg body.
for what it's worth, I've also experienced the same "crushing blow" caused by
large messages in memory. In my case, I immediately dumped all messages to a
database (unfortunately, SQL), extracted the essential metadata I needed for my
application and kept it in the record selected index and search on it. I also
stored the raw message and the processed message in the database as well. Reason
being, that I wanted to be able to analyze the raw message if something failed
(usually Unicode failure) and be able to retrieve the e-mail object from its
json container for quick(er) processing and I would get with parsing the raw
message again (and again).
This experience makes me a supporter of an e-mail module that has a storage
container object that can be searched by any number of metadata fields. these
metadata fields would consist of internal (to the message) data sources and
external data sources. I believe it would be necessary to specify what
searchable fields you want before creating the storage container.
I hope that it would be possible to make the storage container backend Storage
Technology independent so that people like me who will detest SQL until the heat
death of the universe can use something else to store mail messages. I would
also recommend not depending on the file system because in my experience,
performance declined dramatically around 500 messages (ext3 adn jfs). Even
though I was using an SQL database (SQLite), it was significantly faster using
the database.
Thanks to all who are working on this project. I wish I could participate more
but, life has other plans for me.
More information about the Email-SIG
mailing list