[Mailman-Developers] GSOC 2013 project discussion

Avik Pal avikpal.me at gmail.com
Wed Apr 17 14:56:53 CEST 2013


          thanks a lot for the information. Thing is that I don't think
that the Spam classifier by itself is going to be big enough so I came up
with this idea. Actually I also need to know what the community wants,
regarding the e-mail delivery. and regarding the classifier I don't think
that it is not going to be a problem at all( from my end, with my previous
experience in machine learning NLP, just we need a database for the
subscribers where classifier data for them is going to be stored) but the
most important thing is what you have pointed out "Is it the best use for
our limited resources (funding, mentor time, etc.)?" I am looking forward
to Barry, Terri in this regard.

         Meanwhile It would be much appreciated if someone can direct me to
an labeled dataset available on line.

         Also somebody was talking about legal aspects in some countries
and also the fact that the classification to be done in MTA only. Here I
have a suggestion, after submitting, whenever an email is classified as
Spam, we store it in a separate archive and after the end of the day send
them a mail telling "this is the digest for all the mails that Mailman
thinks to be Spam" the subscriber may go there and can view them and also
can mark them as not Spam, which will help the learning algorithm to work
on the decision boundary and also the precision recall are also to be found
out which upon adjusting the boundary or after being marked by majority(in
simple words) as not Spam will be incorporated back into the main archive
and will be sent as a part of the main digest then. Emails which stays as
Spam will be dropped after a month


Avik Pal
Bengal Engineering & Scieence University,Shibpur
github:https://github.com/avikpal
IRC:- irc://freenode/avikp,isnick
twitter:-https://twitter.com/avikpalme





On 17 April 2013 17:37, Richard Wackerbarth <rkw at dataplex.net> wrote:

> In evaluating a proposal, we need to look at a number of factors:
>
> First, will it work? -- Does the proposed design accomplish the stated
> objective?
> Next: Is it useful?
> And: Can the candidate be expected to accomplish the task within the
> allotted time frame?
> Finally: Is it the best use for our limited resources (funding, mentor
> time, etc.)?
>
> If your presentation makes it easier to answer each of those questions in
> a positive manner, it will increase the likelihood that it will get funded.
>
>
> On Apr 17, 2013, at 6:16 AM, Avik Pal <avikpal.me at gmail.com> wrote:
>
> for identifying an important message a classifier will be implemented. and
> thanks for pointing out the issue regarding the delivery of the message, if
> it is delivered twice then the existing implementation of delivery is
> sufficient, but if we want to deliver it only once then for each person we
> need to maintain a database of important mails/threads to him(or
> vice-versa) and while sending check against that database. but this is
> going to raise some normalization issues which are to be taken care of by
> careful designing.
>
> Avik Pal
> Bengal Engineering & Science University,Shibpur
> github:https://github.com/avikpal
> IRC:- irc://freenode/avikp,isnick
> twitter:-https://twitter.com/avikpalme
>
>
>
>
>
> On 17 April 2013 01:02, Richard Wackerbarth <rkw at dataplex.net> wrote:
>
>> An interesting suggestion -- A couple of things to consider:
>>
>> How do you identify "important" messages?
>>
>> Will you deliver these messages twice -- first as important and then,
>> later, as a part of the digest ?
>>
>>
>> On Apr 16, 2013, at 2:13 PM, Avik Pal <avikpal.me at gmail.com> wrote:
>> >         also I would like to propose an idea of my own. Many of us set
>> the
>> > preference in mailman to get all the emails of a day batched together,
>> but
>> > sometimes this means we miss important mails(though we get it at the
>> end of
>> > the day but we miss the moment)----important to the community, or my own
>> > interest, discussion on something I also have discussed upon in my
>> previous
>> > mails, delivery of these mails instantly to the subscriber so that he
>> can
>> > also join at that very moment may come out to be a very useful feature.
>> > Thus person gets to set two options
>> >        1.receive batched mails only.
>> >        2.receive batched mails with important mails delivered instantly.
>>
>>
>
>


More information about the Mailman-Developers mailing list