[Mailman-Developers] Re: Future of pipermail?
Chuq Von Rospach
chuqui@plaidworks.com
Tue, 21 Nov 2000 23:54:04 -0800
At 2:23 AM -0500 11/22/00, Bill Bumgarner wrote:
>Yes-- all as true with a database system... but require the addition
>of something
>other than just a vanilla HTTP server to implement the "get the data" part.
So does running mailman. So does running a search engine on the
archives. We've already committed to adding a bunch of stuff to
vanilla HTTP server (heck, what started this whole discussion,
WebDAV,is adding something ot a vanilla HTTP server This is a
strawman -- we can't do anything close to what we need with a vanilla
HTTP server, so putting down one thing because it doesn't meet that
goal is wrong. NOTHING useful meets that goal)
>Everything on the list above save for the "with little addition"
>item can be done
>w/an out of the box apache.
um, arguable.
>
> Obviously, this does not include the administrative
>part-- the piece that does per list/per-site configuration requires some
>additional work regardless of a database or filesystem (or WebDAV)
>backing store.
oh -- everyting, well, except, um... (giggle)
>Another advantage to a filesystem based archival arrangement is that it is
>*extremely* easy to write a random shell script or two that prunes data from
>crontab, rebuilds indices, moves stuff around, reformats things,
>archives off to
>archived archives, etc...
have you ever worked with NNTP? Because you're reinventing something
the NNTP people have spent years designing out of their systems,
because it has horrible scaling capabilities and is horrible resource
inefficient. Some of us were arguing that this was a bad design model
over a decade ago, and it's been proven by NNTP very nicely.
> Yes-- of course-- all of these operations can be done
>with a database backing store, as well, but it is significantly more
>difficult to
>develop such tools and install them into the system.
It is? have you done it? I have... (www.apple.com/signmeup). It's not
as bad as you think.
> Likely, this is somewhat of
>a perception issue, but most administrators will not hesitate to
>toss together a
>shell script to manage an archive of stuff, but will think twice
>before diddling a
>database.
yes, that's a eprception issue, and it's also a strawman. I don't buy
that for a second. Numbers, please. you can't throw a strawman like
this out without backing data.
>Very large and very expensive databases scale very well-- MySQL does not.
sure it does. But even more important -- if I outgrow MySQL and need
to throw in a really big muther database like Oracle, I can fairly
easily. If your filesystem store system is outgrown and you need to
add capacity to scale it, how do you do it? you rearchitect from
scratch, probably going to a database-centric design.
> I
>agree that for truly huge, high traffic sites, moving away from a
>pure filesystem
>approach-- moving everything into, say, an ES10000 running one of
>the magawhompus
>Oracle/Sybase license-- would be the way to go.
Oh, please. I'm running big megawhompus stuff on E250's and E450s
(for my news servers), and MySQL no problemo. My big muther lsit
server is MySQL on an E250, handles 28,000 database updates a day on
average, sends out a few million emails a week, and spends most of
its time idle (the only huge CPU sink I have is bounce processing,
but then it take stime to process 300 megabyte bounce files)
> But I don't think that is what
>90% of the Mailman users are going to be using the system for and
>requiring-- or
>even encouraging the use of-- a database as a backing store for
>messages will not
>add value to those people.
nor is that what I'm proposing. I don't know why you're so database
averse, to be honest. but I think it's a personal aversion on your
part, not a legitimate teechincal issue.
>Considering most of the usage of an archive of messages....
>
> - write operations are infrequent, modifications pretty much non existant
>
> - retrieval tends to be extremely sporadic and is generally *not* evenly
>divided across the archive-- a relative few messages receive most of the hits
>
> - there are extremely limited ways of viewing the data; by author, date,
>thread, subject.... with MOST views focused on thread.
>
> - indices are periodically updated
>
>... I still believe that a webserver-reading-files-from-filesystem
>is going to be
>loads more effecient than a
>webserver-reading-data-from-client-server-connection-to-app-adaptor-reading-data-from-client-server-connection-to-multiuser-database.
Sorry, but my reseach doesn't agree. And I'm not sure I agree with
your idea of what goes on in archives, but I don't have numbers to
back myself up on that. It also ignores ancillary advantages of
databasing this stuff -- like the easy addition of content searching
and the ability to write really good, customized search capabilities.
In your way, you have to build technology (or borrow it, like HtDig)
to get that, so anything you might possibly save resource or
development wise in your model gets eaten by trying to do searching
right -- and one thing I *have* found from my users is that archives
without good search tools are pretty useless to them. So I consider
archive/search a single key integrated module, even if the
technologies are seeparate, and databasing stuff allows me to build a
lot of search power into the system, where your setup doesn't -- you
have to go and do it the hard way (and I've done that, and it
sucks...)
>
>Yes-- if you are doing textual searches *directly* against the filesystem,
>filesystems are a lose. But a database is not a whole lot better
>unless you blow
>the relatively major
Not true.
>Regardless of where the data is stored, searches would typically
>have to be backed
>by some kind of a indexing store-- be it in a dabase, a btree type solution
but one of the nice things about databases is they're written to
build indexes for you -- and the guys who write those indexing
routines are experts. so you leverage their strengths.
>solutions available. Building that index from a database or from a
>filesystem
>isn't really that much different in terms of difficulty though a
>number of folks
>would find the problem of walking a filesystem more approachable
>than walking a
>database.
I've done both. I disagree. filesystem-centric systems are system
intensive resource hogs that are fine for small to medium
installations but scale poorly, and which require re-architecting
when you outgrow them. Databse centric ones might be a little more
work up front, might be somewhat overkill for tiny sites, but even
for small ones, tend to break even, and scale upward basically
infinitely because you can swap in bigger horses based on your budget
-- but you don't need those horses. You can do really good stuff with
open source tools. All you need is some good database design. not
even great database design.
>With that said, I do feel strongly that an abstraction layer does little good
>without concrete implementations of the adaptors underneath. As such, I
>volunteer myself to write the adaptor to WebDAV and I'll tentatively volunteer
>Chuq to write the adaptor that speaks to a database backing store.
Actually, chuq's planning on architecting large parts of mailman 3.0,
assuming Barry gives the Okay. but I volunteered for that weeks ago,
and have been whacking on concepts on and off since. And since a lot
of what I hope to do in mailman wll be leveraged off work I've
already done or am planning to do for Other Things I Can't Admit To,
a lot of it is beyond "I think this will work, maybe" in thought...
Since 1994, I've architected and implemented about half a dozen
production e-mail systems, from really tiny things based on common
tools (first was Listproc, then majordomo, now Mailman, for the three
generations of my 'generic' servers), to really bastard big things to
corporate email systems... And in the next six months, I'm rewriting
my big muther almost from scratch to take it to the 25,000,000
subscriber capability and double delivery speed (again), and then
we're probably redoing the internal corporate (which supports ~15,000
lists) to handle list lookup and delivery/authentication on demand
via LDAP to the corporate databases (right now, we
snapshot/download/munge the data...). So a lot of where I think
Mailman ought to go is taking pieces of some of these boxes (with my
bosses kind permission) and making it part of Mailman...
And trust me, it'll be a long time before even my biggest email
system needs Oracle or an ES10000. you can do wonders with a stack of
Ultra 5s slaved to a decent sized box like an E250 (in fact, in many
cases, taht's a lot better). I apologize if this sounds like I'm
pulling rank, but -- you keep saying that things can't be done that I
know are wrong, because I already have in some form or another, or
that I've already done the design (and/or prototype for and know how
it'll work.
chuq
--
Chuq Von Rospach - Plaidworks Consulting (mailto:chuqui@plaidworks.com)
Apple Mail List Gnome (mailto:chuq@apple.com)
The vet said it was behavioral, but I prefer to think of it as genetic.
It cuts down on the liability -- Get Fuzzy