[Mailman-Developers] Scrubber mungs Quoted Printable

Mark Sapiro msapiro at value.net
Mon Nov 28 19:01:56 CET 2005


Mark Sapiro wrote:
>
>I think the fix for the current problem is the following patch -
>
>--- mailman-2.1.6/Mailman/Handlers/Scrubber.py
>+++ mailman-mas/Mailman/Handlers/Scrubber.py
>@@ -376,9 +376,8 @@
>         # Now join the text and set the payload
>         sep = _('-------------- next part --------------\n')
>         del msg['content-type']
>-        msg.set_payload(sep.join(text), charset)
>         del msg['content-transfer-encoding']
>-        msg.add_header('Content-Transfer-Encoding', '8bit')
>+        msg.set_payload(sep.join(text), charset)
>     return msg

I still think this is the correct fix, but it turns out there are some
tricky issues here that I believe come down to an error in the
set_payload() method.

Under certain circumstances, in particular when charset is 'iso-8859-1',

    msg.set_payload(text, charset)

'apparently' encodes the text as quoted-printable and adds a

Content-Transfer-Encoding: quoted-printable

header to msg. I say 'apparently' because if one prints msg or creates
a Generator instance and writes msg to a file, the message is
printed/written as a correct, quoted-printable encoded message, but

    text = msg._payload
or

    text = msg.get_payload()

gives the original text, not quoted-printable encoded, and

    text = msg.get_payload(decode=1)

gives a quoted-printable decoding of the original text which is munged
if the original text included '=' in some ways.

This is a problem for Mailman because if Scrubber is processing
individual messages, the 'apparently' quoted-printable message gets
passed ultimately to SMTPDirect which calls Decorate, and Decorate
does msg.get_payload(decode=1) when adding the header and/or footer
and can mung the message in the process.

There is also an issue with archiving when the archiver gets a
multipart message which is subsequently flattened by Scrubber.

The following is a transcript of a Python interactive session that
illustrates the above problems with set_payload() and get_payload().
This session is with Python 2.4.1, but exactly the same behavior
occurs with 2.3.4 and 2.4.2.

Python 2.4.1 (#1, May 27 2005, 18:02:40)
[GCC 3.3.3 (cygwin special)] on cygwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import email
>>>
>>> msg = email.message_from_file(open('plain2.eml'))
>>>
>>> print msg
>From nobody Mon Nov 28 09:18:41 2005
From: "Mark Sapiro" <msapiro at value.net>
To: list1 at localhost
Subject: HTML - all
Date: Sun, 27 Nov 2005 09:02:33 -0800
MIME-Version: 1.0
Content-Type: text/plain; charset="iso-8859-1"


How about just a line of stuff with some ==== and a few words.

X=91**2 (x is 91 squared)

>>>
>>> del msg['content-type']
>>> del msg['content-transfer-encoding']
>>> msg.set_payload(str(msg.get_payload()), 'iso-8859-1')
>>>
>>> print msg
>From nobody Mon Nov 28 09:18:41 2005
From: "Mark Sapiro" <msapiro at value.net>
To: list1 at localhost
Subject: HTML - all
Date: Sun, 27 Nov 2005 09:02:33 -0800
MIME-Version: 1.0
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable


How about just a line of stuff with some =3D=3D=3D=3D and a few words.

X=3D91**2 (x is 91 squared)

>>>
>>> print msg.get_payload()

How about just a line of stuff with some ==== and a few words.

X=91**2 (x is 91 squared)

>>>
>>> print msg.get_payload(decode=1)

How about just a line of stuff with some == and a few words.

X`**2 (x is 91 squared)

-- 
Mark Sapiro <msapiro at value.net>       The highway is for gamblers,
San Francisco Bay Area, California    better use your sense - B. Dylan



More information about the Mailman-Developers mailing list