[Mailman-Developers] Scrubber.py confusion, 2.1b3

Wed, 14 Aug 2002 05:21:40 -0400

BTW reading over the patch, it looks like I got a tab expansion issue, sorry, 5am blues :-) new one below

MJM


Index: MimeDel.py
===================================================================
RCS file: /cvsroot/mailman/mailman/Mailman/Handlers/MimeDel.py,v
retrieving revision 2.1
diff -u -r2.1 MimeDel.py

--- MimeDel.py 18 Apr 2002 20:46:53 -0000 2.1
+++ MimeDel.py 14 Aug 2002 09:19:29 -0000
@@ -33,7 +33,9 @@
 from Mailman import Errors
 from Mailman.Logging.Syslog import syslog
 from Mailman.Version import VERSION
-
+from Mailman.Handlers.Scrubber import save_attachment
+from time import strftime
+from Mailman.i18n import _


 def process(mlist, msg, msgdata):
@@ -41,6 +43,7 @@
     if not mlist.filter_content or not mlist.filter_mime_types:
         return
     # We also don't care about our own digests or plaintext
+    make_attachment(mlist, msg)
     ctype = msg.get_type('text/plain')
     mtype = msg.get_main_type('text')
     if msgdata.get('isdigest') or ctype == 'text/plain':
@@ -54,7 +57,7 @@
     if msg.is_multipart():
         # Recursively filter out any subparts that match the filter list
         prelen = len(msg.get_payload())
-        filter_parts(msg, filtertypes)
+        filter_parts(mlist, msg, filtertypes)
         # If the outer message is now an emtpy multipart (and it wasn't
         # before!) then, again it gets discarded.
         postlen = len(msg.get_payload())
@@ -96,7 +99,7 @@



-def filter_parts(msg, filtertypes):
+def filter_parts(mlist, msg, filtertypes):
     # Look at all the message's subparts, and recursively filter
     if not msg.is_multipart():
         return 1
@@ -104,9 +107,12 @@
     prelen = len(payload)
     newpayload = []
     for subpart in payload:
-        keep = filter_parts(subpart, filtertypes)
+        keep = filter_parts(mlist, subpart, filtertypes)
         if not keep:
             continue
+        if make_attachment(mlist, subpart):
+            newpayload.append(subpart)
+            continue
         ctype = subpart.get_type('text/plain')
         mtype = subpart.get_main_type('text')
         if ctype in filtertypes or mtype in filtertypes:
@@ -164,3 +170,32 @@
         subpart.set_type('text/plain')
         changedp = 1
     return changedp
+
+
+
+def make_attachment(mlist, subpart):
+     #should be set from mlist, work in progress
+     #BTW this will act real stupid with mulipart, it need the real object not the house keeping
+    attach_filter = ['image/bmp', 'image/jpeg', 'image/tiff', 'image/gif', 'image/png', 'image/pjpeg', 'image/x-png',
'image/x-wmf']
+    ctype = subpart.get_type('text/plain')
+    mtype = subpart.get_main_type('text')
+    if ctype in attach_filter or mtype in attach_filter:
+        cctype = subpart.get_type()
+        #size is off, just could not stand to call decode to correct, might just take off 20% and be done
+        size = len(subpart.get_payload())
+        desc = subpart.get('content-description', (_('not available')))
+        filename = subpart.get_filename(_('not available'))
+        url = save_attachment(mlist, subpart, strftime("attch/%Y%m/%d"))
+        del subpart['content-type']
+        del subpart['content-transfer-encoding']
+        del subpart['content-disposition']
+        del subpart['content-description']
+        subpart.add_header('Content-Type', 'text/plain', charset='us-ascii')
+        subpart.add_header('Content-Transfer-Encoding', '7bit')
+        subpart.set_payload(_("""\
+Name: %(filename)s Type: %(cctype)s Size: %(size)d bytes Desc: %(desc)s
+Url: %(url)s
+"""))
+        return 1
+    else:
+        return 0
----- Original Message -----
From: "Michael Meltzer" <mjm@michaelmeltzer.com>
To: "Barry A. Warsaw" <barry@python.org>
Cc: <Mailman-Developers@python.org>
Sent: Wednesday, August 14, 2002 5:02 AM
Subject: Re: [Mailman-Developers] Scrubber.py confusion, 2.1b3


> save_attachment is looking good, "Cool", my only gripe is the url are getting very long, 80 column wrap will be an ongoing issue
and
> most likely unsolvable. I am not married to the path issue/usage I used.  I did have a problem with after 3 years by using the
fully
> qualified date their would be over 1000 files in one directory.
>
> I am not sure about white vs. black list. The white list is nice because I know what type will pass thought, but will have the
> problem of playing catch up with new type's, hassle factor for the admin's and questions from new users. The black list is nice
but
> will I wake up one mooring and read about the "latest hole" that is being exploited, could ruin a whole day ;-) Pondering it, I
> suspect a white list with a good set of defaults should work. I kind of like the "get the extension form mime type" but it broke
> down as soon as I tried to attach a "word" document, came up a application/octet-stream with only the extension as a clue. I like
> the method but I do not think it will last, we will end back up at lists(or maybe a real opensource anti-virus :-)
>
> MJM
> PS. I am sure I will get the pointy hat award for the patch below :-) I also have it running on the test server at
> http://www.michaelmeltzer.com/mailman/listinfo/meltzer-list , it open(at least for a few day :-), if anyone want to past some
> traffic thought it and see the output..............Just do not flood it out.
>
>
>
>
>
> Index: MimeDel.py
> ===================================================================
> RCS file: /cvsroot/mailman/mailman/Mailman/Handlers/MimeDel.py,v
> retrieving revision 2.1
> diff -u -r2.1 MimeDel.py
> --- MimeDel.py 18 Apr 2002 20:46:53 -0000 2.1
> +++ MimeDel.py 14 Aug 2002 08:21:58 -0000
> @@ -33,7 +33,9 @@
>  from Mailman import Errors
>  from Mailman.Logging.Syslog import syslog
>  from Mailman.Version import VERSION
> -
> +from Mailman.Handlers.Scrubber import save_attachment
> +from time import strftime
> +from Mailman.i18n import _
>
>
>  def process(mlist, msg, msgdata):
> @@ -41,6 +43,7 @@
>      if not mlist.filter_content or not mlist.filter_mime_types:
>          return
>      # We also don't care about our own digests or plaintext
> +    make_attachment(mlist, msg)
>      ctype = msg.get_type('text/plain')
>      mtype = msg.get_main_type('text')
>      if msgdata.get('isdigest') or ctype == 'text/plain':
> @@ -54,7 +57,7 @@
>      if msg.is_multipart():
>          # Recursively filter out any subparts that match the filter list
>          prelen = len(msg.get_payload())
> -        filter_parts(msg, filtertypes)
> +        filter_parts(mlist, msg, filtertypes)
>          # If the outer message is now an emtpy multipart (and it wasn't
>          # before!) then, again it gets discarded.
>          postlen = len(msg.get_payload())
> @@ -96,7 +99,7 @@
>
>
>
> -def filter_parts(msg, filtertypes):
> +def filter_parts(mlist, msg, filtertypes):
>      # Look at all the message's subparts, and recursively filter
>      if not msg.is_multipart():
>          return 1
> @@ -104,9 +107,12 @@
>      prelen = len(payload)
>      newpayload = []
>      for subpart in payload:
> -        keep = filter_parts(subpart, filtertypes)
> +        keep = filter_parts(mlist, subpart, filtertypes)
>          if not keep:
>              continue
> + if make_attachment(mlist, subpart):
> +            newpayload.append(subpart)
> +     continue
>          ctype = subpart.get_type('text/plain')
>          mtype = subpart.get_main_type('text')
>          if ctype in filtertypes or mtype in filtertypes:
> @@ -164,3 +170,32 @@
>          subpart.set_type('text/plain')
>          changedp = 1
>      return changedp
> +
> +
> +
> +def make_attachment(mlist, subpart):
> +     #should be set from mlist, work in progress
> +     #BTW this will act real stupid with mulipart, it need the real object not the house keeping
> +    attach_filter = ['image/bmp', 'image/jpeg', 'image/tiff', 'image/gif', 'image/png', 'image/pjpeg', 'image/x-png',
> 'image/x-wmf']
> +    ctype = subpart.get_type('text/plain')
> +    mtype = subpart.get_main_type('text')
> +    if ctype in attach_filter or mtype in attach_filter:
> + cctype = subpart.get_type()
> + #size is off, just could not stand to call decode to correct, might just take off 20% and be done
> +        size = len(subpart.get_payload())
> +        desc = subpart.get('content-description', (_('not available')))
> +        filename = subpart.get_filename(_('not available'))
> + url = save_attachment(mlist, subpart, strftime("attch/%Y%m/%d"))
> + del subpart['content-type']
> + del subpart['content-transfer-encoding']
> +        del subpart['content-disposition']
> +        del subpart['content-description']
> + subpart.add_header('Content-Type', 'text/plain', charset='us-ascii')
> + subpart.add_header('Content-Transfer-Encoding', '7bit')
> + subpart.set_payload(_("""\
> +Name: %(filename)s Type: %(cctype)s Size: %(size)d bytes Desc: %(desc)s
> +Url: %(url)s
> +"""))
> +        return 1
> +    else:
> +        return 0
>
>
>
>
>
> ----- Original Message -----
> From: "Barry A. Warsaw" <barry@python.org>
> To: "Michael Meltzer" <mjm@michaelmeltzer.com>
> Cc: <Mailman-Developers@python.org>
> Sent: Tuesday, August 13, 2002 11:38 AM
> Subject: Re: [Mailman-Developers] Scrubber.py confusion, 2.1b3
>
>
> >
> > >>>>> "MM" == Michael Meltzer <mjm@michaelmeltzer.com> writes:
> >
> >     MM> Actually I "reusing" the code from Scrubber.py in MimeDel.py
> >     MM> to turn attachments into links :-) I hardwired it for image
> >     MM> types but it is generic enough. Some sample output from my
> >     MM> "staging":
> >
> >     MM> Name: beach.jpg Type: image/jpeg Size: 18853 bytes Desc:
> >     MM> not_available Url:
> >     MM> http://www.michaelmeltzer.com/pipermail/meltzer-list/attachments/200208/12/beach.jpg-0005.jpe
> >
> > Cool.  I'm using a slightly different naming algorithm for the path.
> >
> >     MM> It turned out to be a 4 line hack to filter_parts, 1 line at
> >     MM> the top and 10 lines to reformat the payload, the reset came
> >     MM> from save_attachment, very handle :-)
> >
> > Can you try to update it to current cvs?  If it's really a 4 line
> > hack, you've got to post it. :)  I tried to write the Scrubber.py
> > updates with you in mind, by factoring out some other functionality
> > you might need.
> >
> >     MM> I have to admit environment is nice to work in.
> >
> > :)
> >
> >     MM> I am not sure my code it upto patch quality :-) The next step
> >     MM> would be a modification to the content filter page for the
> >     MM> type it should react to.
> >
> >     MM> I would also subject(Scrubber.py needs this too) that the
> >     MM> filter pages list the extensions that it is allow to write. Or
> >     MM> the converse the extensions it should not write,
> >     MM> http://office.microsoft.com/Assistance/2000/Out2ksecFAQ.aspx. would
> >     MM> be my start :-), save the masses someday :-)
> >
> > I've been thinking about this.  I vaguely remember that someone did a
> > patch to support pass-or-block semantics to the filter, but I can't
> > put my finger on it now.  I want to link Dan Mick's name to that, but
> > does this ring a bell with anyone?
> >
> >     MM> The issue with the directory is the number of files, not a
> >     MM> name clash
> >
> > Yep, I know.
> >
> >     MM> , `ls -d archives/private/listname/attachments/* |
> >     MM> wc -l` > 1000 I think system performance will be
> >     MM> effected. Above 10,000 I know it would(it would also be a
> >     MM> problem for the http server on access). I can understand that
> >     MM> keeping the attachment from each email in it own directory,
> >     MM> but this way the "files version control" :-) groups them
> >     MM> together for access(assuming least regency theory) and make
> >     MM> cleaning out for space/inodes simple. it was just strftime
> >     MM> wielded on.
> >
> > I'm not sure I followed all that, but the current Scrubber.py does add
> > the date directory to the path, so I think we're good here.
> >
> > -Barry
>
>
> _______________________________________________
> Mailman-Developers mailing list
> Mailman-Developers@python.org
> http://mail.python.org/mailman-21/listinfo/mailman-developers