[Mailman-Developers] [PATCH] mimelib base64 and q-p message decoding

Thu, 13 Sep 2001 15:18:55 +0900

This patch is against mimelib CVS.

It adds a few new extremely useful functions to mimelib.Message, which
let users recursively get all parts of a message, decoded into
viewable text format -- even non-text attachments will be replaced
with a user-configurable message!

It's required for the next patch I'm about to send, which ports
pipermail/HyperArch and a few various other scripts from
pythonlib/rfc822.py to mimelib.  This is a huge help, because it was
very difficult to follow the code when two slightly different Message
classes (mimelib's and rfc822's) were being used.

In addition, it adds a feature to Mailman that's been requested for
some time: Base64-encoded gobbeldygook will no longer show up in the
archives!  Instead, non-viewable attachments will be displayed (by
default) as:

[Non-text (image/gif) part of message not included, filename foo.gif]

In addition, users who attach multiple text files to a message will
have them all show up in Pipermail, without the message delimiter
gobbeldygook.. Both quoted-printable and base64 are dealt with in this
patch, so Pine users who attach text files encoded in base64 will have
them show up properly in the archives.

This also lets us remove the quoted-printable code from Pipermail
completely, as all the MIME-related stuff should be dealt with in
mimelib from now on.

This patch also includes my previous mimelib patch, which adds the
getcharsets function.  Here's a list of the functions this patch adds.
It doesn't modify anything else besides including a few more modules.

getfilename(self, failobj=None):
    Returns the filename associated with the payload if present.

decode_body(self):
    Returns a string of the non-multipart message's body, decoded.

    If the message is encoded with quoted-printable or base64, will
    decode and return its payload.  Otherwise, returns the payload
    as-is.

    Returns None if the message is multipart.

get_decoded_payload(self, non_text_msg):
    Returns an array containing all decoded text parts of a MIME message.

    Will recurse through all payloads even if the message is
    multi-part.  When a text part is seen, decodes the text if it's
    Base64 or Quoted-Printable encoded, and appends the (possibly
    decoded) text part to the end of the resulting array.

    When a non-text part is seen, replaces it with non_text_msg, which
    defaults to:

    [Non-text (%(type)s) part of message not included, filename %(filename)s]\n

    and appends it to the end of the resulting array.

    The following keywords will be expanded in non_text_msg:

    type: Full MIME type of the non-text part
    maintype: Main MIME type of the non-text part
    subtype: Sub MIME type of the non-text part
    filename: Filename of the non-text part
    description: Description associated with the non-text part
    transfer-encoding: Transfer encoding of the non-text part

get_text_payload(self, non_text_msg):
    Return the decoded body of the message in a text format.

    If the message is not text, will return non_text_msg, formatted
    as described in get_decoded_payload().

getcharsets(self, default):
    Return an array containing the charset[s] used in a message.

    Returns an array containing one element for each part of the
    message; will return an array of one element if the message is not
    a multipart message.
        
    Each element will either be a string (the charset in the
    Content-Type of that part) or the value of the 'default' parameter
    (defaults to None), if the part is not a text part or the charset
    is not defined.

Patch follows.

Index: mimelib/Message.py
===================================================================
RCS file: /cvsroot/mimelib/mimelib/mimelib/Message.py,v
retrieving revision 1.13
diff -u -r1.13 Message.py

--- mimelib/Message.py	2001/05/04 18:47:22	1.13
+++ mimelib/Message.py	2001/09/13 06:15:07
@@ -5,6 +5,11 @@
 
 import re
 import address
+import base64
+import quopri
+import string
+from StringIO import StringIO
+from rfc822 import unquote
 from types import ListType
 
 SEMISPACE = '; '
@@ -272,3 +277,132 @@
             if name.lower() == param:
                 return address.unquote(val)
         return failobj
+
+    def getcharsets(self, default=None):
+        """Return an array containing the charset[s] used in a message.
+    
+        Returns an array containing one element for each part of the
+        message; will return an array of one element if the message is not
+        a multipart message.
+        
+        Each element will either be a string (the charset in the
+        Content-Type of that part) or the value of the 'default'
+        parameter (defaults to None), if the part is not a text part
+        or the charset is not defined.
+        """
+        result = []
+        
+        if self.ismultipart():
+            for p in self.get_payload():
+                if p.getmaintype() == "text":
+                    result.append(p.getparam("charset"))
+                else:
+                    result.append(default)
+        else:
+            if self.getmaintype() == "text":
+                result.append(self.getparam("charset"))
+            else:
+                result.append(default)
+
+        return result
+
+    def getfilename(self, failobj=None):
+        """Return the filename associated with the payload if present."""
+
+        disp = self.get("Content-Disposition")
+
+        if disp is None:
+            return failobj
+
+        # Match up to and not including the next semicolon, if any
+        filename = re.search(r'filename\s*=\s*([^;]*)', disp, re.IGNORECASE)
+
+        if filename is None:
+            return failobj
+
+        # Trim the whitespace if there was any at the end of the filename,
+        # then remove quotes from around it.
+        return unquote(string.rstrip(filename.group(1)))
+
+    def get_decoded_payload(self, non_text_msg="[Non-text (%(type)s) part of message not included, filename %(filename)s]\n"):
+        """Return an array containing all decoded text parts of a MIME message.
+
+        Will recurse through all payloads even if the message is
+        multi-part.  When a text part is seen, decodes the text if
+        it's Base64 or Quoted-Printable encoded, and appends the
+        (possibly decoded) text part to the end of the resulting
+        array.
+
+        When a non-text part is seen, replaces it with non_text_msg,
+        which defaults to:
+
+        [Non-text (%(type)s) part of message not included, filename %(filename)s]\n
+
+        and appends it to the end of the resulting array.
+
+        The following keywords will be expanded in non_text_msg:
+
+        type: Full MIME type of the non-text part
+        maintype: Main MIME type of the non-text part
+        subtype: Sub MIME type of the non-text part
+        filename: Filename of the non-text part
+        description: Description associated with the non-text part
+        transfer-encoding: Transfer encoding of the non-text part
+        """
+        result = []
+        if self.ismultipart():
+            for p in self.get_payload():
+                # Can we even get multipart messages w/ multiparts inside?
+                if p.ismultipart():
+                    result.append(p.get_decoded_payload(non_text_msg))
+                else:
+                    result.append(p.get_text_payload(non_text_msg))
+        else:
+            result.append(self.get_text_payload(non_text_msg))
+            
+        return result
+
+    def get_text_payload(self, non_text_msg):
+        """Return the decoded body of the message in a text format.
+
+        If the message is not text, will return non_text_msg, formatted
+        as described in get_decoded_payload().
+        """
+        if self.getmaintype('text') == 'text':
+            return self.decode_body()
+        else:
+            return non_text_msg % {
+                'type': self.gettype(),
+                'maintype': self.getmaintype(),
+                'subtype': self.getsubtype(),
+                'filename': self.getfilename(),
+                'description': self.get('Content-Description'),
+                'transfer-encoding': self.get('Content-Transfer-Encoding')
+                }
+
+    def decode_body(self):
+        """Return a string of the non-multipart message's body, decoded.
+
+        If the message is encoded with quoted-printable or base64, will
+        decode and return its payload.  Otherwise, returns the payload
+        as-is.
+
+        Returns None if the message is multipart.
+        """
+        if self.ismultipart():
+            return None
+        
+        cte = self.get('Content-Transfer-Encoding')
+        # Assume no encoding if header not specified
+        if cte is None:
+            return self.get_payload()
+        elif string.lower(cte) == 'quoted-printable':
+            input = StringIO(self.get_payload())
+            output = StringIO()
+            quopri.decode(input, output)
+            return output.getvalue()
+        elif string.lower(cte) == 'base64':
+            return base64.decodestring(self.get_payload())
+        # Otherwise, could be 7bit, 8bit, or binary.. don't mess with it
+        else:
+            return self.get_payload()