[Mailman-Developers] GSOC 2013 project discussion

Wed Apr 17 18:51:34 CEST 2013

  ya I get your point, but see these are part of any machine learning
project, and feature extraction has to be done considering the synthetic
data set.

On 17 April 2013 22:05, Terri Oda <terri at zone12.com> wrote:

>
>
> Finding sources of spam (like that one) isn't that hard; it's finding
> sources of legit email combined with spam and classified and processed in
> the same way that's challenging.  As I said, you can combine a spam source
> like this with a publicly available mailing list to make a synthetic set,
> but scientifically speaking, those aren't really preferred ways to handle
> data because they come from multiple sources.
>
>
>
    well in this regard the only thing I can do is keep looking, I am also
aware that coming from different sources can make them skewed but again
these things are never perfect and there are always scope for betterment, I
think that our aim should be to implement a rudimentary classifier with
fairly good performance to start with.