[Mailman-Developers] Multipart logic bug (was: Why Multipart with 7bit messages?)

Tue Jan 6 17:44:00 EST 2004

Michael Heydekamp <my at freexp.de> wrote on 06.01.04:

> I just realized that Mailman 2.1.3 sometimes (and IMO unnecessarily)
> creates Multipart messages upon appending the mailing list footer.

> After some investigation I found out that it does so only if the mail
> body does not contain any 8bit characters.  If it contains 8bit
> characters then everything is fine.

> Is this a bug or by design [...]

It's both. :-)

I found the relevant code in Handlers/Decorate.py - this is the place
where the footer is just concatenated rather than appended as a separate
MIME subpart:

----------8<----------
    if not msg.is_multipart() and msgtype == 'text/plain' and \
           msg.get('content-transfer-encoding', '').lower() <> 'base64' and \
           (lcset == 'us-ascii' or mcset == lcset):
        oldpayload = msg.get_payload()
    [...]
----------8<----------

The reason why Mailman created a Multipart (although the message as well
as the footer were in plain ASCII!) is that the preferred_language of
the list is set to German.  This obviously corresponds to the Charset
ISO-8859-1 in Mailman and thus the variable 'lcset' above was set to
'iso-8859-1' and the condition returns false (as 'mcset == lcset' was
false as well).

If the message did contain 8bit characters (and was declared as
ISO-8859-1), then 'mcset == lcset' returned true und the footer was
concatenated.

Funny: Would it have been declared as ISO-8859-15 or UTF-8, the footer
would unnecessarily have been appended as a separate MIME subpart again.


The following is just a temporary hack, not a real fix (but currently
works for me):

----------8<----------

--- Decorate_old.py     Fri Oct  3 01:56:28 2003
+++ Decorate_new.py     Tue Jan  6 22:37:22 2004
@@ -79,7 +79,7 @@
     wrap = 1
     if not msg.is_multipart() and msgtype == 'text/plain' and \
            msg.get('content-transfer-encoding', '').lower() <> 'base64' and \
-           (lcset == 'us-ascii' or mcset == lcset):
+           (lcset == 'us-ascii' or lcset == 'iso-8859-1' or mcset == lcset):
         oldpayload = msg.get_payload()
         frontsep = endsep = ''
         if header and not header.endswith('\n'):
----------8<----------


Please note that this logic is still broken and should be fixed in the
next version due to the following reasons:

1. The preferred_charset of the list does not necessarily have to be
   exactly the same as the charset of the message.  The charset of the
   message can be different from the charset of the list but still be
   of the same charset family - or not.  This needs to be checked.
   If the preferred_charset is US-ASCII is completely irrelevant, IMO.

2. The code does not even check if the footer contains 8bit characters.
   If the language of the list is set to English (= US-ASCII) but the
   footer contains 8bit characters, the current routine would
   concatenate it to a mail declared with US-ASCII rather than appending
   a separate MIME subpart. This would clearly lead to a wrong declared
   8bit message.

3. IANA charset aliases are ignored.

4. Another problem that I'm seeing is that I'm not sure if it's OK to
   make assumptions such as "Germany == ISO-8859-1".  If I'm adding a
   footer in the admin web interface it will most likely be in the local
   charset of my machine, right?


Thus, IMHO a footer may be concatenated only (but then always), if

a) the footer is in plain ASCII, or

b) the footer is of the same charset family as the declared charset of
   the message *and* does not contain any illegal characters (for
   instance Win-1252 characters in the range 80h to 9Fh if the message
   has an ISO-8859 charset where these characters are reserved). IANA
   aliases should be considered.


As I'm not familiar with Python I could not produce better code on my
own, sorry.


        Michael