[Mailman-Developers] Scrubber mungs quoted-printable revisited.

Mark Sapiro msapiro at value.net
Sat Jan 28 06:10:53 CET 2006


To refresh, see
<http://mail.python.org/pipermail/mailman-developers/2005-November/018395.html>
and
<http://mail.python.org/pipermail/mailman-developers/2005-December/018449.html>.

The problem with set_payload() creating  a message for which a
subsequent get_payload did 'too much' decoding was fixed by having
Scrubber.py add a 'X-Mailman-Scrubbed: Yes' header upon doing a
set_payload(), and then in various places where there are subsequent
get_payload() calls, setting the decode flag according to the presence
or absence of the header.

I actually like the header as an explanation of content change along
the lines of X-Content-Filtered-By:, but I've been uneasy about
setting the get_payload() decode flag based on the presence or absence
of 'X-Mailman-Scrubbed: Yes', although it seems to work in all cases
we've tried.

Recently, I looked in more detail at the actual set_payload() method in
the email library and I have at least a vague understanding of the
problem. The problem and my understanding are reported at
<http://sourceforge.net/tracker/?func=detail&aid=1409455&group_id=5470&atid=105470>.
I have suggested a patch there which I call a 'Hint at possible fix'.
This patch could be applied in Scrubber.py.

The patch to Scrubber.py would add to the end of the

def replace_payload_by_text(msg, text, charset):

definition making the whole definition

def replace_payload_by_text(msg, text, charset):
    # TK: This is a common function in replacing the attachment and the
main
    # message by a text (scrubbing).  Also, add a flag indicating it
has been
    # scrubbed.
    del msg['content-type']
    del msg['content-transfer-encoding']
    msg.set_payload(text, charset)
    msg['X-Mailman-Scrubbed'] = 'Yes'
    if msg.get('content-transfer-encoding') == 'quoted-printable':
        cset = msg.get_charset()
        if cset:
            msg._payload = cset.body_encode(msg._payload)
            msg._charset = None


The advantage to doing this in Scrubber.py and unconditionally setting
the decode flag for subsequent get_payload() calls is it makes the
whole process insensitive to whether or not or when Python email bug #
1409455 is fixed. If the bug is fixed, the payload will be encoded and
msg.get_charset() above will return None so the payload won't be
encoded a second time. The additional code above can be removed at
some point after we're sure the email library used by Mailman is fixed.


-- 
Mark Sapiro <msapiro at value.net>       The highway is for gamblers,
San Francisco Bay Area, California    better use your sense - B. Dylan



More information about the Mailman-Developers mailing list