[Mailman-Developers] Re: Future of pipermail?

Bill Bumgarner bbum@codefab.com
Wed, 22 Nov 2000 00:49:46 -0500


Chuq Von Rospach wrote:

> At 11:23 PM -0500 11/21/00, Bill Bumgarner wrote:
>
> >- archival of messages is a lot more than just writing the bodies to
> >a web server and then generating some kind of automatic TOC/index.
>
> agreed completely. I'd take it a step further and say it probably
> shouldn't generate indexes at all, but that indexes should be
> generated when a user wants to access the archives, dynamically.
> That's probably the single major weakness of mhonarc.

I'd take it a step beyond THAT and say that this is really almost a per-list
issue.   I have mailing lists that:

    - are professional in nature and I wouldn't mind even PAYING for a realtime
solution that was very user friendly

    - low bandwidth lists that on demand indexing would not be an issue

    - high bandwidth free lists that once-a-night indexing would be the ideal.

[bunch of useful stuff that I agree with deleted]

>
>
> >- for the archival of plain text messages, WebDAV is overkill [as
> >Chuq mentions].  However, as soon as you move to archiving mime
> >attachments, it quickly becomes extremely advantageous to archive
> >various properties with the archived message pieces.
>
> but you can do that with a lot less overhead in MySQL by doing a
> focussed database. In fact, you could program a system to do this via
> DBI that'd work in any DBI-capable environment, so users could roll
> their own based on what they've already adopted. unless WebDAV gives
> us enough extra capabilities to be worthy of the specialization, my
> argument is (and will be) we program to a more general API (like
> DBI), so that we work in many environments, and if someone wants,
> they can program a DBI->WebDAV interface to attach to it. This way,
> we get DB, MySQL, PostGres, Oracle, ODBC, yadayada more or less for
> free, giving us functionality across multiple environments that users
> can tailor. If we program just to WebDAV, we don't get that.

This is where i disagree *very* strongly-- maybe not with the implementation
choice [DBI], but with the reasoning behind it.

I don't think archival should be treated as a database centric operation.  The
concept of archival falls very naturally into a static hierarchy of
collections/directories containing resoruces/files with a bit of additional meta
information associated with some resources.   This is exactly the kind of
information archive that a web server is *designed* to optimally serve.   Adding
extra layers here or abstracting to a DBI really doesn't buy us much.

Alone, a basic filesystem served webserver gives us:

    - effecient access to archives

    - basic per-site, per-list authentication

    - [with little addition] unified access/passwords between lists, etc...

    - almost *zero* overhead with *very little* implementation cost

WebDAV adds the ability to do advanced locking, easy meta information storage,
etc... but-- most importantly-- does not take away the very effecient presentation
of data naturally present within a filesystem of stuff served by a web server.

As well, a filesystem centric storage/presentation solution-- webdav or raw
filesystem-- solves *most* peoples archiving problems *most* of the time.

I feel *very strongly* that the archival solution-- whether it be raw messages or
decoded messages-- should be centric to storing files in directories and serving
files from directories.

The second reason I feel strongly that moving to a DBI based interface wouldn't
present that much of an advantage is that most people that need to actually store
the data in a database are going to have their own requirements surrounding
decoding, storage, indexing, and presentation of said database related content.
There are few *real* standards in terms of the storage of multimedia [MIME]
content into a database environment and, as such, the developer will likely have
to rip the data out of whatever our implementation prefers and into their own
storage subsystem.

In my experience [storing email into a database was actually a problem we had to
solve-- this is the implementation we successfully/effectively used], it is far
more convenient to provide an HTTP [replaced with a MODULAR in a real
implementation] gateway that delivers the processed, but still relatively raw,
messages to some other subsystem for subsequent parsing and storage.   In our
case, we used HTTP to deliver inbound messages to a WebObjects application  that
parsed the message into EOs [enterprise objects] and persisted those via the
various APIs included with WebObjects.

Another way of looking at this is that as soon as most developers are going to
want to work with the data in the context of a true database, most developers are
also going to want to actually use their tool-du-jour [WebObjects, ASP, EJB, PHP,
Zope, etc...] to process that information.   Taking a two pronged aproach to
archival--  we provide a simple  [and modular] filesystem-esque approach to store
the data in a more traditional manner-- be it directly to the filesystem or via a
WebDAV adaptor (since WebDAV is very filesystem like, just w/HTTP as the protocol
of choice) and an equally as simple modular gateway that allows the
developer/administrator to easily configure the system such that the data is
delivered to their server of choice via the protocol of chioice-- will likely
reduce the complexity of our implementation and increase the attractiveness in
that our codebase is that much simpler and more approachable.

This is *not* to say that the DBI approach isn't the right way to go;  if a
generic DBI->filesystem, DBI->WebDAV, DBI->DB capable API were put together and
was relatively hidden from the user and casual developer, it might be a huge win.


>
> So it's choosing what the appropriate interfaces are that's as
> important as having interfaces. you don't program to a technology
> unless you have to -- you program to an interface that enables
> technologies. (image: this is chuqui. this is a dead horse. This is
> chuqui holding a whip...)

And bbum following with a club.... :-)  Agreed.

>
> >- ....restoring decoded attachments and reencoding back to their
> >original state with their original headers is an extremely cool
> >feature.
>
> Truly. And if we can support BLOBs in DBI, well, we don't have to
> write anything to disk and can generate an entire message out of a
> DBI database -- portable to any decent database.

But an order of magnitude less effecient than downloading the BLOB off of disk via
a webserver!

Generic access with simple access control is what *most* users/administrators
want  *most* of the time.   More complex/abstracted/portable access is less of a
requirement and *a lot* of the people with such requirements also have other
issues-- real or imagined-- that dictate that they really just want Mailman to
hand the stuff to them as quickly/easily as possible and be done with it.

> >- if we are to manage the complexity associated with the integration
> >of numerous technologies, it is only going to happen through well
> >refined and highly modular APIs....
>
> agreed. and to make ti clear, I'm not arguing against WebDAV. I'm
> arguing that for something like this, you define the interface and
> see if you can build it in a way that you don't JUST get WebDAV, but
> support at a more abstracted level that gets you a range of supported
> technologies (and future capability for that yet discovered) for an
> incrementally greater amount of work. the trick is to find the right
> abstractions and the proper technology layer to attach that to.

Totally-- and I hope no one thought I was advocating WebDAV as the end all, be
all, only solution!

I feel strongly that abstraction is key, but that we should also provide decent,
production quality, implementations of solutions to the very same set of problems
for which we build the gneric abstracted/modularized APIs.

If Mailman is not fully functional "out of the box", then people will ignore it.
However, if it isn't also flexible enough to be integrated into their weird
environments (because every server on the web has weirdness), they'll bitch and
moan until they find something else that doesn't solve their problem to B&M
about....

b.bum