[Mailman-Developers] Requirements for a new archiver

J C Lawrence claw at kanga.nu
Wed Oct 29 21:16:20 EST 2003


On Wed, 29 Oct 2003 16:12:50 -0800 
Chuq Von Rospach <chuqui at plaidworks.com> wrote:
> On Oct 29, 2003, at 2:28 PM, Peter C. Norton wrote:

>> I may not have made it clear, but I'm focusing on the metadata.  Once
>> you've parsed rfc822/2822, then it may become easier to have things
>> in the database that can manipulate those types.  I.e. to do be able
>> to do simple searches for a property of given arbitrary headers (w/o
>> having to have a database schema that consists of a few known headers
>> and "others" which you then have to treat as a blob or as text).

> my only real worry is that from what I've seen, 99.99% of the time,
> the user is going to want content searches. header stuff is fine, but
> of really low priority in the scheme of things (necessary to put
> useful things together, meaningless if you can't content/context
> search in fulltext).

I see two needs, for significantly different populations.  The first
wants a browsing interface with keyed and indexed by date, thread, and
author.  The second wands full text search with rapid location and
retrieval of matching messages.  Often a single user will move between
the access methods, reading by thread, bouncing over to a search, then
reading all an author has written that match, then searching again, etc.
As such two distinct sets of indexes seem called for: full text and
message meta-data.

> that's why I'm leaning, blob issues or no, towards full-text storage
> in MySQL 4. Because if you can't easily chop up the message body
> content and find the messages you want to deal with, elegant storage
> of the headers is irrelevant...

True.  However, but this seems to conflate two distinct problems.  If
you're going to do unindexed searches then this makes sense, however
except for minimal cases that's an interesting space.  It scales like
crap and has an even worse feature set.  It is more interesting to split
storage and indexing into distinct solution designs, and to build or
pick something tailored for that smaller problem.  That way you don't do
full text searching, you do full text indexing and then search the
indexes.

> I think you need that, too. But until you get a reasonable context
> search for the message body, designing the rest is silly. 

Is searching message bodies really interesting, or is building indexes
of message bodies such that you can later search those indexes the
actually interesting point?  

> And it seems to me there are few better methods than dumping the text
> into MySQL and letting it do the work. Compromises, tradeoffs and etc
> notwithstanding...

How does MySQL help you in building language-sensitive rapid response
indexes of large text blobs?

-- 
J C Lawrence                
---------(*)                Satan, oscillate my metallic sonatas. 
claw at kanga.nu               He lived as a devil, eh?		  
http://www.kanga.nu/~claw/  Evil is a name of a foeman, as I live.



More information about the Mailman-Developers mailing list