[ mailman-Bugs-759841 ] Multipart/mixed issues in archives

SourceForge.net noreply at sourceforge.net
Wed Nov 7 03:47:26 CET 2007


Bugs item #759841, was opened at 2003-06-24 10:22
Message generated for change (Comment added) made by rekt
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=100103&aid=759841&group_id=103

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Pipermail
Group: 2.1 (stable)
Status: Open
Resolution: Fixed
Priority: 8
Private: No
Submitted By: Pug Bainter (phelim_gervase)
Assigned to: Nobody/Anonymous (nobody)
Summary: Multipart/mixed issues in archives

Initial Comment:
We are having problems with mailing lists that are not
being properly stripped down to text content in the
archives. When there is multipart/mixed, it doesn't
pull the multipart/alternative sections into their
appropriate text portions.

  For example, from content such as the following:

==============================================================================
>From ...
[...]
Content-class: urn:content-classes:message
MIME-Version: 1.0
Content-Type: multipart/mixed;
boundary=------------InterScan_NT_MIME_Boundary
[...]

This is a multi-part message in MIME format.

--------------InterScan_NT_MIME_Boundary
Content-Type: multipart/alternative;
        boundary="----_=_NextPart_001_01C336A1.2C7564BC"
Content-Transfer-Encoding: 7bit


------_=_NextPart_001_01C336A1.2C7564BC
Content-Type: text/plain;
 charset=us-ascii
Content-Transfer-Encoding: quoted-printable

Kevin has a pending checkin that addresses the
minss/maxss issue.
=20
[...]
------_=_NextPart_001_01C336A1.2C7564BC
Content-Type: text/html;
 charset=us-ascii
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0
Transitional//EN">
<HTML xmlns=3D"http://www.w3.org/TR/REC-html40" xmlns:v
=3D=20
"urn:schemas-microsoft-com:vml" xmlns:o =3D=20
"urn:schemas-microsoft-com:office:office" xmlns:w =3D=20
"urn:schemas-microsoft-com:office:word" xmlns:x =3D=20
"urn:schemas-microsoft-com:office:excel" xmlns:st1 =3D=20
"urn:schemas-microsoft-com:office:smarttags"><HEAD><TITLE>Message</TITLE>=

[...]
==============================================================================

  I only get the following:


==============================================================================
[64bit-compiler-analysis] RE: vpr analysis
Syyyy Kyyyyy syyyk at yyy.com
Thu Jun 19 14:27:16 CDT 2003

Previous message: [64bit-compiler-analysis] 06-19-03
MSFT 64-Bit C/C++ compiler
+improvement discussion
Next message: [64bit-compiler-analysis] RE: vpr analysis
Messages sorted by: [ date ] [ thread ] [ subject ] [
author ]

--------------------------------------------------------------------------------

Skipped content of type multipart/alternative


--------------------------------------------------------------------------------


Previous message: [64bit-compiler-analysis] 06-19-03
MSFT 64-Bit C/C++ compiler
+improvement discussion
Next message: [64bit-compiler-analysis] RE: vpr analysis
Messages sorted by: [ date ] [ thread ] [ subject ] [
author ]

--------------------------------------------------------------------------------
More information about the 64bit-compiler-analysis
mailing list
==============================================================================

As you can see, the actual content of the
multipart/alternative portion [text/plain and
text/html] were completely stripped out  instead of
being shown a plain text.


----------------------------------------------------------------------

Comment By: Daniel Kahn Gillmor (rekt)
Date: 2007-11-06 21:47

Message:
Logged In: YES 
user_id=842404
Originator: NO

Thank you very much, Mark!

I'm assuming that this is the commit you're talking about:

http://marc.info/?l=mailman-cvs&m=119440136928253&w=2

I just applied the following diff to a debian lenny installation (mailman
2.1.9-8) i've been experimenting on:

--- Scrubber.py.orig    2007-11-06 21:15:30.000000000 -0500
+++ Scrubber.py 2007-11-06 21:16:07.000000000 -0500
@@ -342,7 +342,8 @@
         text = []
         for part in msg.walk():
             # TK: bug-id 1099138 and multipart
-            if not part or part.is_multipart():
+            # MAS test payload - if part may fail if there are no
headers.
+            if not part._payload or part.is_multipart():
                 continue
             # All parts should be scrubbed to text/plain by now.
             partctype = part.get_content_type()

After recompiling Scrubber.py, I then did:

 /var/lib/mailman/bin/arch --wipe testlist

and it fixed a message with a similar formatting issue that had previously
been blank.

My only concern is that in the thread you linked to, it's mentioned that
arch --wipe can break external links.  This makes me reluctant to use it to
fix older archives with blank messages which might have accumulated
external links.  URLs should be stable!  Is this really a possible
consequence of arch --wipe?

----------------------------------------------------------------------

Comment By: Mark Sapiro (msapiro)
Date: 2007-11-06 20:46

Message:
Logged In: YES 
user_id=1123998
Originator: NO

It turns out this problem has been observed and discussed at great length
in December of 2006. See the thread that begins at
<http://mail.python.org/pipermail/mailman-users/2006-December/054904.html>.

A few fixes were discussed in that thread but never implemented. I have
now tested a fix along the lines of that discussion and committed it and it
will be in Mailman 2.1.10 (beta release is imminent).

----------------------------------------------------------------------

Comment By: Mark Sapiro (msapiro)
Date: 2007-11-06 17:22

Message:
Logged In: YES 
user_id=1123998
Originator: NO

You are correct. I was thinking that without the header, the following
text would be a preamble, but this is not the case.

There does appear to be a problem here, and I will look into it further.
The reconstructed message helps alot. Thanks for that.

BTW, the problem is not with pipermail. The message is processed by
Mailman/Handlers/Scrubber.py and flattened to plain text before pipermail
ever sees it. I have verified that the underlying Python email library
parses the MIME structure correctly and sees the body as a text/plain
part.

I have some ideas, but I haven't looked closely enough to be sure. I'll
post again when I know more.

----------------------------------------------------------------------

Comment By: Daniel Kahn Gillmor (rekt)
Date: 2007-11-06 16:05

Message:
Logged In: YES 
user_id=842404
Originator: NO

Just did a bit of digging.  It looks like section 5.2 of RFC 2045 suggests
that missing content-types should be treated as:

  Content-type: text/plain; charset=us-ascii

While i agree that it would be better for the sending MUA to include an
explicit content-type for each mime part (i'm about to file a bug against
the MUA), it seems problematic for pipermail to refuse to render such a
part at all.

----------------------------------------------------------------------

Comment By: Daniel Kahn Gillmor (rekt)
Date: 2007-11-06 15:55

Message:
Logged In: YES 
user_id=842404
Originator: NO

Thanks for the response, msapiro.  marc.info's raw copy of it looks
basically identical to the version of that message that arrived in my
inbox, so i'd say it's a correct copy.  The RFC822 headers for the raw
message were:

Return-Path: <openssh-unix-dev-bounces at mindrot.org>
To: <openssh-unix-dev at mindrot.org>
Subject: Re: scp -t . - possible idea for additional parameter
From: Daniel Kahn Gillmor <dkg-openssh.com at fifthhorseman.net>
Date: Thu, 11 Oct 2007 12:34:23 -0400
Message-ID: <87y7e9d300.fsf at squeak.fifthhorseman.net>
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="===============1431543891=="

When i supply the concatenation of those headers, a blank line, and then
the raw message to msglint, the IETF's message validator [0], it outputs:

-----------
OK: found part multipart/mixed line 10
OK: preamble 10: 
OK: found part multipart/signed line 15
OK: preamble 15: 
OK: found default part text/plain line 18
OK: found part application/pgp-signature line 67
OK: epilogue 86: 
WARNING: MIME headers should only be 'Content-*'. No meaning will apply to

         header 'MIME-Version' at line 89
OK: found part text/plain line 93
-----------

So that validator doesn't have any problem with the message (it assumes
the part starting at line 18, which is the section you're suggesting is
invalid, is text/plain).  Is the validator wrong in assuming that?  I don't
know the relevant specifications well enough to tell myself.  Can you show
me where it's a requirement that each MIME section have a content-type?

Thanks for looking into this.

[0] http://www.apps.ietf.org/msglint.html


----------------------------------------------------------------------

Comment By: Mark Sapiro (msapiro)
Date: 2007-11-06 15:04

Message:
Logged In: YES 
user_id=1123998
Originator: NO

I can't tell for sure, but the message at
<http://marc.info/?l=openssh-unix-dev&m=119212056224122&w=2> appears to be
malformed. If I go to
<http://marc.info/?l=openssh-unix-dev&m=119212056224122&q=raw> to view the
alleged raw message, I see at the beginning:

--===============1431543891==
Content-Type: multipart/signed; boundary="=-=-=";
	micalg=pgp-sha1; protocol="application/pgp-signature"

--=-=-=

On Thu 2007-10-11 11:00:41 -0400, Larry Becke wrote:
...

I expect to see something like:

--===============1431543891==
Content-Type: multipart/signed; boundary="=-=-=";
	micalg=pgp-sha1; protocol="application/pgp-signature"

This is an OpenPGP/MIME signed message (RFC 2440 and 3156)
--=-=-=
Content-Type: text/plain; charset=...
Content-Transfer-Encoding: ...

On Thu 2007-10-11 11:00:41 -0400, Larry Becke wrote:
...

I.e., I don't see a Content-Type: header for the message body. If it is in
fact missing, that would cause Mailman's behavior in this case, but it is
the message that is at fault, not Mailman.

So the question is whether or not the alleged raw message is in fact a
true representation. If it is, then I think it is the sender's MUA that is
at fault.

----------------------------------------------------------------------

Comment By: Daniel Kahn Gillmor (rekt)
Date: 2007-11-06 13:52

Message:
Logged In: YES 
user_id=842404
Originator: NO

This bug (or something very similar to it) seems to still be a problem. 
Consider the message here:

 http://marc.info/?l=openssh-unix-dev&m=119212056224122&w=2

and in its pipermail archive:


http://lists.mindrot.org/pipermail/openssh-unix-dev/2007-October/025812.html



----------------------------------------------------------------------

Comment By: Joe Pruett (q7joey)
Date: 2005-03-18 00:00

Message:
Logged In: YES 
user_id=559223

i just looked at the cvs closer and i see that the patch is
on the 2.1 branch, but hasn't made it into the trunk yet.

----------------------------------------------------------------------

Comment By: Joe Pruett (q7joey)
Date: 2005-03-17 23:52

Message:
Logged In: YES 
user_id=559223

i just started working on a 2.1.5 system and discovered that
this bug was still there.  from looking in cvs, it appears
to be fixed there (although it seems to reference an
unrelated bugid).

updating this bug to reflect the cvs update would be nice.

----------------------------------------------------------------------

Comment By: Tokio Kikuchi (tkikuchi)
Date: 2003-12-27 20:17

Message:
Logged In: YES 
user_id=67709

The patch by q7joey is merged into my Scrubber.py patch
#866238. I hope Barry can integrate it in 2.1.4.


----------------------------------------------------------------------

Comment By: Joe Pruett (q7joey)
Date: 2003-09-27 12:48

Message:
Logged In: YES 
user_id=559223

i have a few line patch that seems to make it do what is
expected.

i can't see how to attach via sourceforge yet, so i'll paste
it here:

---
/usr/local/src/mailman-2.1.2/Mailman/Handlers/Scrubber.py  
Fri Feb  7 23:13:50 2003
+++ ./Scrubber.py       Sat Sep 27 08:58:46 2003
@@ -286,11 +286,13 @@
         # BAW: Martin's original patch suggested we might
want to try
         # generalizing to utf-8, and that's probably a good
idea (eventually).
         text = []
-        for part in msg.get_payload():
+        for part in msg.walk():
+            if part.get_main_type() == 'multipart':
+                continue
             # All parts should be scrubbed to text/plain by
now.
             partctype = part.get_content_type()
             if partctype &lt;&gt; 'text/plain':
-                text.append(_('Skipped content of type
%(partctype)s'))
+                text.append(_('Skipped content of type
%(partctype)s\n'))
                 continue
             try:
                 t = part.get_payload(decode=1)


----------------------------------------------------------------------

Comment By: Martin RJ. Cleaver (mrjc)
Date: 2003-09-27 03:23

Message:
Logged In: YES 
user_id=50125

This fails for many of my users as they habitually attach a 
photo of themselves in their signatures. They are incredulous 
at the idea that mailman can't handle it.

Thanks

----------------------------------------------------------------------

Comment By: Joe Pruett (q7joey)
Date: 2003-09-26 21:26

Message:
Logged In: YES 
user_id=559223

i agree that this should be a high priority issue.  a simple
message with just multipart/alternative will show up in the
archive ok, but if there is any other kind of attachment,
then the entire multipart section is skipped and you just
get a link for the extra attachment for download/view
ability.  i haven't started to look at the code (and i'm not
a python/mailman person), but i'll report anything i can find.

----------------------------------------------------------------------

Comment By: Martin RJ. Cleaver (mrjc)
Date: 2003-09-22 09:34

Message:
Logged In: YES 
user_id=50125

Additionally I think it is appropriate to up the priority on this 
bug as it causes key functionality to fail.


----------------------------------------------------------------------

Comment By: Martin RJ. Cleaver (mrjc)
Date: 2003-09-22 09:26

Message:
Logged In: YES 
user_id=50125

This is causing me real problems! Is there any known 
workarounds?

If I can't fix this I might have to use a different package as 
presently all my archives are useless!


----------------------------------------------------------------------

Comment By: Pug Bainter (phelim_gervase)
Date: 2003-06-24 13:01

Message:
Logged In: YES 
user_id=484284

This appears to be within:

def process(mlist, msg, msgdata=None):

at around line 276, but I saw no way of making it recurse
for multipart/[mixed|alternative] sub-MIME parts.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=100103&aid=759841&group_id=103


More information about the Mailman-coders mailing list