[Mailman-Developers] [PATCH] mimelib base64 and q-p message decoding

Barry A. Warsaw barry@zope.com
Fri, 14 Sep 2001 17:52:45 -0400


[mimelib-devel'ers, please see
http://mail.python.org/pipermail/mailman-developers/2001-September/009588.html
-BAW]

>>>>> "BG" == Ben Gertzfield <che@debian.org> writes:

    BG> This patch is against mimelib CVS.

Ben,

I really like what you've added here, and intend to merge those into
mimelib.  First a couple of general comments and then some specific
ones.

mimelib will soon be merged into the Python 2.2 standard library,
under the package name `email'.  It will generally have the same class
structure, modules, etc. although it is likely that some of the method
names will be changed.  I plan to release a mimelib 0.5 in which I
will include your changes (see below), along with a few other patches
and bug fixes I've collected.

I plan on including the email package in Python 2.2a4 to be released
next week.

When I merge it into Python 2.2, I will essentially stop releasing
mimelib separately, although I may piggyback off the SF project for a
while so that I can do distutils-based releases of the email package;
we'll see how easy that turns out to be.  If difficult, then I'll add
this stuff to Mailman/pythonlib for the next Mailman alpha.

    BG> It adds a few new extremely useful functions to
    BG> mimelib.Message, which let users recursively get all parts of
    BG> a message, decoded into viewable text format -- even non-text
    BG> attachments will be replaced with a user-configurable message!

I wonder if we couldn't generalize some of this into a "subpart
walker", a la os.path.walk()?  I'm not going to do that now, but it's
something to keep in mind for later.

    BG> It's required for the next patch I'm about to send, which
    BG> ports pipermail/HyperArch and a few various other scripts from
    BG> pythonlib/rfc822.py to mimelib.  This is a huge help, because
    BG> it was very difficult to follow the code when two slightly
    BG> different Message classes (mimelib's and rfc822's) were being
    BG> used.

Indeed!  Boy, I'm glad someone has the nerve to dive into the archiver
code. :)  It'll be way cool to eliminate the need for rfc822, except
internally for some of mimelib's implementation, which will eventually
go away.

    BG> In addition, it adds a feature to Mailman that's been
    BG> requested for some time: Base64-encoded gobbeldygook will no
    BG> longer show up in the archives!  Instead, non-viewable
    BG> attachments will be displayed (by default) as:

    BG> [Non-text (image/gif) part of message not included, filename
    BG> foo.gif]

Awesome!  I hope you don't mind that I changed this message just a
little bit:

    [Non-text (%(type)s) part of message omitted, filename %(filename)s]\n

Also, is the trailing newline necessary?

    BG> In addition, users who attach multiple text files to a message
    BG> will have them all show up in Pipermail, without the message
    BG> delimiter gobbeldygook.. Both quoted-printable and base64 are
    BG> dealt with in this patch, so Pine users who attach text files
    BG> encoded in base64 will have them show up properly in the
    BG> archives.

    BG> This also lets us remove the quoted-printable code from
    BG> Pipermail completely, as all the MIME-related stuff should be
    BG> dealt with in mimelib from now on.

All this is truely fantastic, thanks!

    BG> This patch also includes my previous mimelib patch, which adds
    BG> the getcharsets function.  Here's a list of the functions this
    BG> patch adds.  It doesn't modify anything else besides including
    BG> a few more modules.

I'm going to simplify some of the implementations when I check them
in, and I may also change the method names, although perhaps I should
keep yours for `backwards' compatibility?

    | getfilename(self, failobj=None):
    |     Returns the filename associated with the payload if present.

I'm also adding a getboundary() since I tend to use that a lot!

    | decode_body(self):
    |     Returns a string of the non-multipart message's body, decoded.

Here's where I've hit a conundrum.  What's the difference between
"body" and "payload"?  To me, the body contains the entire flattened
contents of the outer message, while the payload contains just first
level down from the outer message.  I.e. it is definitely possible to
have nested multiparts, e.g. multipart/mixed which contains some stuff
including a multipart/digest -- think Mailman's MIME digests!

Thus the outer message's body would include all the multipart/digest's
message/rfc822 subparts, but the outer message's payload would include
just the multipart/mixed object.

Along those lines, I'd call this method getdecodedpayload() since it
doesn't recurse.

    | get_decoded_payload(self, non_text_msg):
    |    Returns an array containing all decoded text parts of a MIME message.

So this one can't be called get_decoded_payload() :).  I propose
getpayloadastext().  Also, I'm setting non_text_msg=None as a default
value, and if non_text_msg is None, then I set it to the string before
I interpolate the dictionary.  I'm also shortening the
'transfer-encoding' key to just 'encoding', and giving all the other
self.get*() methods failobjs like '[no MIME type]' so they don't try
to interpolate "None" if it's missing.

    | get_text_payload(self, non_text_msg):
    |     Return the decoded body of the message in a text format.

So, because this one recurses, I propose to call it getbodyastext().

Side note: the naming scheme in mimelib.Message is getting both
inconsistent and clumsy.  I intend to rectify this when I merge it
into Py2.2.  Question: is backwards compatibility with mimelib 0.x
important?

I claim it isn't, even though this means a lot of busy work for me,
fixing Mailman's code.  Heck, that's what 0.x releases are for!  I'm
leaning toward a naming scheme such as:

    def getDecodedPayload()
    def getPayloadAsText()
    def getBodyAsText()

modulo comments from Guido.

    | getcharsets(self, default):
    |   Return an array containing the charset[s] used in a message.

Cool.

Comments?  I will likely check something in tonight, although I'll
need to add unittest cases and documentation.

Thanks,
-Barry