[Mailman-Developers] Want to Code... need some feedback

J C Lawrence claw@2wire.com
Thu, 05 Jul 2001 14:32:47 -0700


On Thu, 05 Jul 2001 14:16:53 -0700 
Charles Iliya Krempeaux <tnt@linux.ca> wrote:

> Hello, J C Lawrence <claw@kanga.nu> wrote:

>>> To do this, I think that e-mail messages should be dumped into a
>>> database.  (Since I have MySQL and PostgreSQL at my disposal,
>>> those are what I'll be able to support myself.)

>> I have some early proof of concepts done on having MHonArc
>> generate scripts which when executed insert their respective
>> message contents into PostgresQL with the appropriate threading
>> links.  The code is based off the PHP and templated based
>> archiving I already do at Kanga.Nu, merely taking the already
>> products PHP variable assignments in the current system and
>> insteaf having the back end use them as the values to insert into
>> the DB.
>> 
>> It works.  Kinda.  Its not pretty.  The reliance on PHP as an
>> intermediate layer should be removed (slightly messy as MHonArc
>> insists on inserting HTML-style comments), Proper thread handling
>> and generation needs to be improved (Shouldn't reluy on MHonArc
>> but should be dynamically generated).  etc.

> My way of thinking, of having it designed, is that Mailman (using
> Python) directly dumps the e-mail messages into the database.

<nod>

> (Are there standard [or defacto standard] Python modules for
> accessing databases?... For accessing MySQL and PostgreSQL?)

Yes.

> Then, standard PHP (and whatever other languages)
> bindings/libraries, to the database, can be provided.  That way,
> the database is the middle man.  And Mailman, and the PHP
> binding/library (and any other language binding/library) only
> depend on the database.  (And better still, Mailman is completely
> independent of the PHP [and vice versa].  Only the database
> structure matters.)

The reasons I don't want to do this:

  1) MIME
  2) national (and other) character sets
  3) content types (really a subset of MIME but a large enough
  problem to be unique)

> (To get a little deeper into the design...) the important things I
> see, to extract from each message (and also store), is:

Minimally:

  To: (multiple)
  From: 
  CC:
  To: GECOS (multiple)
  From: GECOS
  CC: GECOS (multiple)
  Date:
  Receipt date
  Receipt address(es)  (multiple)
  MessageID
  References: (multiple)
  In-Reply-To: (computed it missing, flagged)
  Subject:  
  Prior subject (was: (...) matching, opportunistic history match)
  Message Body
  MIME Key (if any)
  MIME structure
  Indexes to external MIME items

-- 
J C Lawrence                                       claw@kanga.nu
---------(*)                          http://www.kanga.nu/~claw/
The pressure to survive and rhetoric may make strange bedfellows