[BangPypers] Mailman archives analysis

Anand Balachandran Pillai abpillai at gmail.com
Thu Jul 17 08:36:47 CEST 2008


Not really. I said it should be a client-side tool because,

 o Like Jeff said, not all archives would want to update server-side
mailman just
    to use it.
 o Being server side introduces additional things to worry about like
authentication,
    if we don't integrate with mailman and use it as a kind of plugin
on the server side.
o  Last but the most important, it introduces a learning curve to
learn mailman and
    understand mailman before it can be used. So the effectiveness of
the tool is
    limited to mailman administrators. I am thinking of a tool which
anyone can use
    by just analyzing the mailman archive web-pages as a client.

The features looking from that perspective would be,

  o Client side tool
  o Input is mailman archive web page - either the complete archive view
     or monthly view
  o Output is written to an sqlite database.
  o Tags support would be great - this can be done by finding out keywords
     in top conversations and creating a tag cloud based on the frequencies
     of these keywords.

This is not a plug for HarvestMan. In fact HarvestMan will be over-kill for
a tool like this since we really don' t need a complete parsing of the pages
to find what we want - smart regular expressions tailored towards exactly
the data we want would be enough. I would use Pyparsing here instead
of regexp.

Instead the tool should start from scratch with its own minimal crawling
loop which extracts only the data we want and nothing more. The focus
is on analysis, not on crawling, if you get what I mean...

+1 for "MailWatcher"...

I will think about a list of features and a design for MailWatcher and post
it to the list soon.

--Anand

On Thu, Jul 17, 2008 at 6:19 AM, Pradeep Gowda <pradeep at btbytes.com> wrote:
>
> On Jul 16, 2008, at 8:40 PM, O.R.Senthil Kumaran wrote:
>>
>> If you look into the methods exposed by Mailman and modify to add those
>> features to the web-interface, would it not be better idea?  It would be
>> available to existing mailman users when they upgrade(or patch it).
>>
>> Something like <mm:TopUsers=5/> in the web-interface which internally
>> calls the
>> method for getting the top-posters.
>>
>
> But that would be no good for Anand's Harvestman code-kata[1] ;)
>
> +PG
> [1] http://codekata.pragprog.com/codekata/
>
> _______________________________________________
> BangPypers mailing list
> BangPypers at python.org
> http://mail.python.org/mailman/listinfo/bangpypers
>



-- 
-Anand


More information about the BangPypers mailing list