[Mailman-Developers] Requirements for a new archiver

Brad Knowles brad.knowles at skynet.be
Wed Oct 29 15:25:53 EST 2003


At 11:54 AM -0800 2003/10/29, Peter C. Norton wrote:

>  It always confounds me that people will go for database voodoo and
>  deride filesystems when a filesystem is a highly specialised database
>  in and of itself.

	I am aware of that.  I was aware of that when I first gave my 
invited talk entitled "Design and Implementation of Highly Scalable 
E-mail Systems", which you can find at 
<http://www.shub-internet.org/brad/papers/dihses/>.

	Note that Eric Allman (author of the original Ingres database, 
among many other things) and Kirk McKusick (author of the Berkeley 
Fast File System) were in the audience.  I did not embarrass myself.

>  Databases aren't meant to be storage for abstract binary data.
>  They're meant to be a searchable index of data of types they
>  understand.

	Correct.  And despite all claims to the contrary from the 
vendors, no database properly "understands" binary large objects, nor 
do they give you another datatype they do actually understand that 
would be suitable for the storage of e-mail message bodies.

>  Assuming I had a clean slate to start a database project for a mail
>  store, personally I'd much rather prototype it in something like
>  postgresql where I could add data types to deal with email.  I could
>  then make header types, text types, mime types classes, etc.  Then I
>  could test to see if it was a good idea to implement it.

	IMO, that would be an exercise in futility.  We've been down this 
road a million times before.  We don't need to go down it again to 
know that the result is not likely to be successful, especially when 
we have alternatives that are proven to work well -- we store the 
message meta-data in the database, and then the message bodies in an 
separate message store akin to INN timecaf/timehash "heaps" (see 
<http://www.shub-internet.org/brad/papers/dihses/lisa2000/sld090.htm>).

>  I think using a standard sql database for doing mail operations is
>  asking for trouble.  Standard databases don't know how to parse
>  rfc822/2822 headers and that means that you've got to either write a
>  whole lot of stored procedures in a clunky query language (or
>  java!?!?!) and then maintain it, or you've got to do it all in the
>  imap/pop3/whatever server which means a whole lot of yammering traffic
>  between the database and the I/P/W server all the time, which == slow.

	You don't ask the database to understand or parse RFC2822 headers 
or messages.  That's up to your application.  You just store data 
using the formats known to the database, and the message bodies 
according to the methods above.

-- 
Brad Knowles, <brad.knowles at skynet.be>

"They that can give up essential liberty to obtain a little temporary
safety deserve neither liberty nor safety."
     -Benjamin Franklin, Historical Review of Pennsylvania.

GCS/IT d+(-) s:+(++)>: a C++(+++)$ UMBSHI++++$ P+>++ L+ !E-(---) W+++(--) N+
!w--- O- M++ V PS++(+++) PE- Y+(++) PGP>+++ t+(+++) 5++(+++) X++(+++) R+(+++)
tv+(+++) b+(++++) DI+(++++) D+(++) G+(++++) e++>++++ h--- r---(+++)* z(+++)



More information about the Mailman-Developers mailing list