[Mailman-Developers] Scrubber mungs Quoted Printable

Tokio Kikuchi tkikuchi at is.kochi-u.ac.jp
Thu Dec 1 02:26:21 CET 2005


Hi Mark,

I once used this patch for japanese mailman: This re-generation was 
rejected by Barry because this may impose heavy load (?).  This hack 
should simplify the charset gotcha just below the patched lines.  Or, we 
may have to introduce a new variable to keep watch if the payload is 
decoded or not in email.Message.Message class.  IMHO, mailing list 
messages should be in plain text without attachments and those who 
attach should pay (the load) for it.


--- Scrubber.py.orig    Thu Dec  1 10:01:45 2005
+++ Scrubber.py Thu Dec  1 10:13:17 2005
@@ -28,6 +28,7 @@
  from cStringIO import StringIO
  from types import IntType, StringType

+from email import message_from_string
  from email.Utils import parsedate
  from email.Parser import HeaderParser
  from email.Generator import Generator
@@ -313,6 +314,9 @@
  Url : %(url)s
  """), lcset)
          outer = False
+    # Re-generation of message instance from stringfied one.
+    # This should normalize the payloads.
+    msg = message_from_string(msg.as_string())
      # We still have to sanitize multipart messages to flat text because
      # Pipermail can't handle messages with list payloads.  This is a 
kludge;
      # def (n) clever hack ;).


Mark Sapiro wrote:
> Mark Sapiro wrote:
> 
>>I think the fix for the current problem is the following patch -
>>
>>--- mailman-2.1.6/Mailman/Handlers/Scrubber.py
>>+++ mailman-mas/Mailman/Handlers/Scrubber.py
>>@@ -376,9 +376,8 @@
>>        # Now join the text and set the payload
>>        sep = _('-------------- next part --------------\n')
>>        del msg['content-type']
>>-        msg.set_payload(sep.join(text), charset)
>>        del msg['content-transfer-encoding']
>>-        msg.add_header('Content-Transfer-Encoding', '8bit')
>>+        msg.set_payload(sep.join(text), charset)
>>    return msg
> 
> 
> I still think this is the correct fix, but it turns out there are some
> tricky issues here that I believe come down to an error in the
> set_payload() method.
> 
> Under certain circumstances, in particular when charset is 'iso-8859-1',
> 
>     msg.set_payload(text, charset)
> 
> 'apparently' encodes the text as quoted-printable and adds a
> 
> Content-Transfer-Encoding: quoted-printable
> 
> header to msg. I say 'apparently' because if one prints msg or creates
> a Generator instance and writes msg to a file, the message is
> printed/written as a correct, quoted-printable encoded message, but
> 
>     text = msg._payload
> or
> 
>     text = msg.get_payload()
> 
> gives the original text, not quoted-printable encoded, and
> 
>     text = msg.get_payload(decode=1)
> 
> gives a quoted-printable decoding of the original text which is munged
> if the original text included '=' in some ways.
> 
> This is a problem for Mailman because if Scrubber is processing
> individual messages, the 'apparently' quoted-printable message gets
> passed ultimately to SMTPDirect which calls Decorate, and Decorate
> does msg.get_payload(decode=1) when adding the header and/or footer
> and can mung the message in the process.
> 
> There is also an issue with archiving when the archiver gets a
> multipart message which is subsequently flattened by Scrubber.
> 
> The following is a transcript of a Python interactive session that
> illustrates the above problems with set_payload() and get_payload().
> This session is with Python 2.4.1, but exactly the same behavior
> occurs with 2.3.4 and 2.4.2.
> 
> Python 2.4.1 (#1, May 27 2005, 18:02:40)
> [GCC 3.3.3 (cygwin special)] on cygwin
> Type "help", "copyright", "credits" or "license" for more information.
> 
>>>>import email
>>>>
>>>>msg = email.message_from_file(open('plain2.eml'))
>>>>
>>>>print msg
> 
>>From nobody Mon Nov 28 09:18:41 2005
> From: "Mark Sapiro" <msapiro at value.net>
> To: list1 at localhost
> Subject: HTML - all
> Date: Sun, 27 Nov 2005 09:02:33 -0800
> MIME-Version: 1.0
> Content-Type: text/plain; charset="iso-8859-1"
> 
> 
> How about just a line of stuff with some ==== and a few words.
> 
> X=91**2 (x is 91 squared)
> 
> 
>>>>del msg['content-type']
>>>>del msg['content-transfer-encoding']
>>>>msg.set_payload(str(msg.get_payload()), 'iso-8859-1')
>>>>
>>>>print msg
> 
>>From nobody Mon Nov 28 09:18:41 2005
> From: "Mark Sapiro" <msapiro at value.net>
> To: list1 at localhost
> Subject: HTML - all
> Date: Sun, 27 Nov 2005 09:02:33 -0800
> MIME-Version: 1.0
> Content-Type: text/plain; charset="iso-8859-1"
> Content-Transfer-Encoding: quoted-printable
> 
> 
> How about just a line of stuff with some =3D=3D=3D=3D and a few words.
> 
> X=3D91**2 (x is 91 squared)
> 
> 
>>>>print msg.get_payload()
> 
> 
> How about just a line of stuff with some ==== and a few words.
> 
> X=91**2 (x is 91 squared)
> 
> 
>>>>print msg.get_payload(decode=1)
> 
> 
> How about just a line of stuff with some == and a few words.
> 
> X`**2 (x is 91 squared)
> 


-- 
Tokio Kikuchi, tkikuchi@ is.kochi-u.ac.jp
http://weather.is.kochi-u.ac.jp/


More information about the Mailman-Developers mailing list