[Mailman-Developers] Scrubber.py confusion, 2.1b3
Michael Meltzer
mjm@michaelmeltzer.com
Wed, 14 Aug 2002 05:21:40 -0400
BTW reading over the patch, it looks like I got a tab expansion issue, sorry, 5am blues :-) new one below
MJM
Index: MimeDel.py
===================================================================
RCS file: /cvsroot/mailman/mailman/Mailman/Handlers/MimeDel.py,v
retrieving revision 2.1
diff -u -r2.1 MimeDel.py
--- MimeDel.py 18 Apr 2002 20:46:53 -0000 2.1
+++ MimeDel.py 14 Aug 2002 09:19:29 -0000
@@ -33,7 +33,9 @@
from Mailman import Errors
from Mailman.Logging.Syslog import syslog
from Mailman.Version import VERSION
-
+from Mailman.Handlers.Scrubber import save_attachment
+from time import strftime
+from Mailman.i18n import _
def process(mlist, msg, msgdata):
@@ -41,6 +43,7 @@
if not mlist.filter_content or not mlist.filter_mime_types:
return
# We also don't care about our own digests or plaintext
+ make_attachment(mlist, msg)
ctype = msg.get_type('text/plain')
mtype = msg.get_main_type('text')
if msgdata.get('isdigest') or ctype == 'text/plain':
@@ -54,7 +57,7 @@
if msg.is_multipart():
# Recursively filter out any subparts that match the filter list
prelen = len(msg.get_payload())
- filter_parts(msg, filtertypes)
+ filter_parts(mlist, msg, filtertypes)
# If the outer message is now an emtpy multipart (and it wasn't
# before!) then, again it gets discarded.
postlen = len(msg.get_payload())
@@ -96,7 +99,7 @@
-def filter_parts(msg, filtertypes):
+def filter_parts(mlist, msg, filtertypes):
# Look at all the message's subparts, and recursively filter
if not msg.is_multipart():
return 1
@@ -104,9 +107,12 @@
prelen = len(payload)
newpayload = []
for subpart in payload:
- keep = filter_parts(subpart, filtertypes)
+ keep = filter_parts(mlist, subpart, filtertypes)
if not keep:
continue
+ if make_attachment(mlist, subpart):
+ newpayload.append(subpart)
+ continue
ctype = subpart.get_type('text/plain')
mtype = subpart.get_main_type('text')
if ctype in filtertypes or mtype in filtertypes:
@@ -164,3 +170,32 @@
subpart.set_type('text/plain')
changedp = 1
return changedp
+
+
+
+def make_attachment(mlist, subpart):
+ #should be set from mlist, work in progress
+ #BTW this will act real stupid with mulipart, it need the real object not the house keeping
+ attach_filter = ['image/bmp', 'image/jpeg', 'image/tiff', 'image/gif', 'image/png', 'image/pjpeg', 'image/x-png',
'image/x-wmf']
+ ctype = subpart.get_type('text/plain')
+ mtype = subpart.get_main_type('text')
+ if ctype in attach_filter or mtype in attach_filter:
+ cctype = subpart.get_type()
+ #size is off, just could not stand to call decode to correct, might just take off 20% and be done
+ size = len(subpart.get_payload())
+ desc = subpart.get('content-description', (_('not available')))
+ filename = subpart.get_filename(_('not available'))
+ url = save_attachment(mlist, subpart, strftime("attch/%Y%m/%d"))
+ del subpart['content-type']
+ del subpart['content-transfer-encoding']
+ del subpart['content-disposition']
+ del subpart['content-description']
+ subpart.add_header('Content-Type', 'text/plain', charset='us-ascii')
+ subpart.add_header('Content-Transfer-Encoding', '7bit')
+ subpart.set_payload(_("""\
+Name: %(filename)s Type: %(cctype)s Size: %(size)d bytes Desc: %(desc)s
+Url: %(url)s
+"""))
+ return 1
+ else:
+ return 0
----- Original Message -----
From: "Michael Meltzer" <mjm@michaelmeltzer.com>
To: "Barry A. Warsaw" <barry@python.org>
Cc: <Mailman-Developers@python.org>
Sent: Wednesday, August 14, 2002 5:02 AM
Subject: Re: [Mailman-Developers] Scrubber.py confusion, 2.1b3
> save_attachment is looking good, "Cool", my only gripe is the url are getting very long, 80 column wrap will be an ongoing issue
and
> most likely unsolvable. I am not married to the path issue/usage I used. I did have a problem with after 3 years by using the
fully
> qualified date their would be over 1000 files in one directory.
>
> I am not sure about white vs. black list. The white list is nice because I know what type will pass thought, but will have the
> problem of playing catch up with new type's, hassle factor for the admin's and questions from new users. The black list is nice
but
> will I wake up one mooring and read about the "latest hole" that is being exploited, could ruin a whole day ;-) Pondering it, I
> suspect a white list with a good set of defaults should work. I kind of like the "get the extension form mime type" but it broke
> down as soon as I tried to attach a "word" document, came up a application/octet-stream with only the extension as a clue. I like
> the method but I do not think it will last, we will end back up at lists(or maybe a real opensource anti-virus :-)
>
> MJM
> PS. I am sure I will get the pointy hat award for the patch below :-) I also have it running on the test server at
> http://www.michaelmeltzer.com/mailman/listinfo/meltzer-list , it open(at least for a few day :-), if anyone want to past some
> traffic thought it and see the output..............Just do not flood it out.
>
>
>
>
>
> Index: MimeDel.py
> ===================================================================
> RCS file: /cvsroot/mailman/mailman/Mailman/Handlers/MimeDel.py,v
> retrieving revision 2.1
> diff -u -r2.1 MimeDel.py
> --- MimeDel.py 18 Apr 2002 20:46:53 -0000 2.1
> +++ MimeDel.py 14 Aug 2002 08:21:58 -0000
> @@ -33,7 +33,9 @@
> from Mailman import Errors
> from Mailman.Logging.Syslog import syslog
> from Mailman.Version import VERSION
> -
> +from Mailman.Handlers.Scrubber import save_attachment
> +from time import strftime
> +from Mailman.i18n import _
>
>
> def process(mlist, msg, msgdata):
> @@ -41,6 +43,7 @@
> if not mlist.filter_content or not mlist.filter_mime_types:
> return
> # We also don't care about our own digests or plaintext
> + make_attachment(mlist, msg)
> ctype = msg.get_type('text/plain')
> mtype = msg.get_main_type('text')
> if msgdata.get('isdigest') or ctype == 'text/plain':
> @@ -54,7 +57,7 @@
> if msg.is_multipart():
> # Recursively filter out any subparts that match the filter list
> prelen = len(msg.get_payload())
> - filter_parts(msg, filtertypes)
> + filter_parts(mlist, msg, filtertypes)
> # If the outer message is now an emtpy multipart (and it wasn't
> # before!) then, again it gets discarded.
> postlen = len(msg.get_payload())
> @@ -96,7 +99,7 @@
>
>
>
> -def filter_parts(msg, filtertypes):
> +def filter_parts(mlist, msg, filtertypes):
> # Look at all the message's subparts, and recursively filter
> if not msg.is_multipart():
> return 1
> @@ -104,9 +107,12 @@
> prelen = len(payload)
> newpayload = []
> for subpart in payload:
> - keep = filter_parts(subpart, filtertypes)
> + keep = filter_parts(mlist, subpart, filtertypes)
> if not keep:
> continue
> + if make_attachment(mlist, subpart):
> + newpayload.append(subpart)
> + continue
> ctype = subpart.get_type('text/plain')
> mtype = subpart.get_main_type('text')
> if ctype in filtertypes or mtype in filtertypes:
> @@ -164,3 +170,32 @@
> subpart.set_type('text/plain')
> changedp = 1
> return changedp
> +
> +
> +
> +def make_attachment(mlist, subpart):
> + #should be set from mlist, work in progress
> + #BTW this will act real stupid with mulipart, it need the real object not the house keeping
> + attach_filter = ['image/bmp', 'image/jpeg', 'image/tiff', 'image/gif', 'image/png', 'image/pjpeg', 'image/x-png',
> 'image/x-wmf']
> + ctype = subpart.get_type('text/plain')
> + mtype = subpart.get_main_type('text')
> + if ctype in attach_filter or mtype in attach_filter:
> + cctype = subpart.get_type()
> + #size is off, just could not stand to call decode to correct, might just take off 20% and be done
> + size = len(subpart.get_payload())
> + desc = subpart.get('content-description', (_('not available')))
> + filename = subpart.get_filename(_('not available'))
> + url = save_attachment(mlist, subpart, strftime("attch/%Y%m/%d"))
> + del subpart['content-type']
> + del subpart['content-transfer-encoding']
> + del subpart['content-disposition']
> + del subpart['content-description']
> + subpart.add_header('Content-Type', 'text/plain', charset='us-ascii')
> + subpart.add_header('Content-Transfer-Encoding', '7bit')
> + subpart.set_payload(_("""\
> +Name: %(filename)s Type: %(cctype)s Size: %(size)d bytes Desc: %(desc)s
> +Url: %(url)s
> +"""))
> + return 1
> + else:
> + return 0
>
>
>
>
>
> ----- Original Message -----
> From: "Barry A. Warsaw" <barry@python.org>
> To: "Michael Meltzer" <mjm@michaelmeltzer.com>
> Cc: <Mailman-Developers@python.org>
> Sent: Tuesday, August 13, 2002 11:38 AM
> Subject: Re: [Mailman-Developers] Scrubber.py confusion, 2.1b3
>
>
> >
> > >>>>> "MM" == Michael Meltzer <mjm@michaelmeltzer.com> writes:
> >
> > MM> Actually I "reusing" the code from Scrubber.py in MimeDel.py
> > MM> to turn attachments into links :-) I hardwired it for image
> > MM> types but it is generic enough. Some sample output from my
> > MM> "staging":
> >
> > MM> Name: beach.jpg Type: image/jpeg Size: 18853 bytes Desc:
> > MM> not_available Url:
> > MM> http://www.michaelmeltzer.com/pipermail/meltzer-list/attachments/200208/12/beach.jpg-0005.jpe
> >
> > Cool. I'm using a slightly different naming algorithm for the path.
> >
> > MM> It turned out to be a 4 line hack to filter_parts, 1 line at
> > MM> the top and 10 lines to reformat the payload, the reset came
> > MM> from save_attachment, very handle :-)
> >
> > Can you try to update it to current cvs? If it's really a 4 line
> > hack, you've got to post it. :) I tried to write the Scrubber.py
> > updates with you in mind, by factoring out some other functionality
> > you might need.
> >
> > MM> I have to admit environment is nice to work in.
> >
> > :)
> >
> > MM> I am not sure my code it upto patch quality :-) The next step
> > MM> would be a modification to the content filter page for the
> > MM> type it should react to.
> >
> > MM> I would also subject(Scrubber.py needs this too) that the
> > MM> filter pages list the extensions that it is allow to write. Or
> > MM> the converse the extensions it should not write,
> > MM> http://office.microsoft.com/Assistance/2000/Out2ksecFAQ.aspx. would
> > MM> be my start :-), save the masses someday :-)
> >
> > I've been thinking about this. I vaguely remember that someone did a
> > patch to support pass-or-block semantics to the filter, but I can't
> > put my finger on it now. I want to link Dan Mick's name to that, but
> > does this ring a bell with anyone?
> >
> > MM> The issue with the directory is the number of files, not a
> > MM> name clash
> >
> > Yep, I know.
> >
> > MM> , `ls -d archives/private/listname/attachments/* |
> > MM> wc -l` > 1000 I think system performance will be
> > MM> effected. Above 10,000 I know it would(it would also be a
> > MM> problem for the http server on access). I can understand that
> > MM> keeping the attachment from each email in it own directory,
> > MM> but this way the "files version control" :-) groups them
> > MM> together for access(assuming least regency theory) and make
> > MM> cleaning out for space/inodes simple. it was just strftime
> > MM> wielded on.
> >
> > I'm not sure I followed all that, but the current Scrubber.py does add
> > the date directory to the path, so I think we're good here.
> >
> > -Barry
>
>
> _______________________________________________
> Mailman-Developers mailing list
> Mailman-Developers@python.org
> http://mail.python.org/mailman-21/listinfo/mailman-developers