[Mailman-Developers] Boilerplate and content filtering [was: Introduction and Project Discussion]

Mon Apr 15 09:43:47 CEST 2013

On Mon, Apr 15, 2013 at 11:56 AM, Patrick Ben Koetter <p at sys4.de> wrote:

> * Stephen J. Turnbull <stephen at xemacs.org>:
> > Sreyanth writes:
> >
> >  > Also, I would like to hear more about : Boilerplate stripper AND
> Better
> >  > content-filtering / handling error messages.
> >  > Boilerplate stripping is trivial to understand. But, can anyone
> elaborate
> >  > on Better content-filtering / handling error messages?
> >
> > But boilerplate stripping is not necessarily trivial to implement,
> > because it's not always clear what boilerplate is.  I think it might
> > be a good idea to save it off and provide a link rather than discard
> > it, which leads to interesting questions of storage, shared links for
> > true boilerplate (storage compression of repeatedly encountered text,
> > yes, but more important the link will turn purple so you don't need to
> > click on it in the next message from that user!), and user interface
> > in general.
> >
> > Content filtering is mostly going to be about MIME handling: choice of
> > the appropriate text/* part and things like that, removing
> > images/video/etc where the list prohibits them, converting HTML/
> > wordprocessor attachments to plain text, removing MIME parts whose
> > Content-Type doesn't match filename or perhaps file(1) magic in the
> > content, etc.
>
> Just to mention it:
> IF we are going to add MILTER functionality, a MILTER would be perfect to
> do MIME handling.
>
>
If we are going to add a MILTER functionality, even anti-spam filters can
be at the least implemented. Isn't it?
Some days ago, we were discussing about MILTERs in anti-spam context right.
Now, a piece of anti-spam AND anti-abuse can be implemented at this level!
I have implemented a binary Bayesian classifier which classifies an email
either spam or not spam. Using it, making use of the main keywords in the
email as vectors and learning from the reportedly-spam emails from the
logs, we can implement this. After classifying an email as spam, we can
display a line, may be, as "This may be spam. Please be careful while
clicking on links or replying to this email with sensitive information!".
So, using this we can enhance the usage of MILTER at the same time doing
the MIME handling. Correct me if I am wrong. :)

> p at rick
>
> --
> [*] sys4 AG
>
> http://sys4.de, +49 (89) 30 90 46 64
> Franziskanerstraße 15, 81669 München
>
> Sitz der Gesellschaft: München, Amtsgericht München: HRB 199263
> Vorstand: Patrick Ben Koetter, Axel von der Ohe, Marc Schiffbauer
> Aufsichtsratsvorsitzender: Joerg Heidrich
>
> _______________________________________________
> Mailman-Developers mailing list
> Mailman-Developers at python.org
> http://mail.python.org/mailman/listinfo/mailman-developers
> Mailman FAQ: http://wiki.list.org/x/AgA3
> Searchable Archives:
> http://www.mail-archive.com/mailman-developers%40python.org/
> Unsubscribe:
> http://mail.python.org/mailman/options/mailman-developers/sreyanth%40gmail.com
>
> Security Policy: http://wiki.list.org/x/QIA9

-- 
*Yours Sincerely*
*
*
*Mora Sreyantha Chary*
*Computer Engineering '14*
*National Institute of Technology Karnataka*
*Surathkal, India 575 025*