[Mailman-Developers] [PATCH] mimelib base64 and q-p message decoding

Ben Gertzfield che@debian.org
Sat, 15 Sep 2001 10:11:27 +0900


>>>>> "BAW" == Barry A Warsaw <barry@zope.com> writes:
>>>>> "BG" == Ben Gertzfield <che@debian.org> writes:

    BAW> I really like what you've added here, and intend to merge
    BAW> those into mimelib.  First a couple of general comments and
    BAW> then some specific ones.

Great!  I appreciate all the comments and the quick attention. I hope
I'm not being a pain, sending off all these rapid-fire patches. *grin*

    BG> It adds a few new extremely useful functions to
    BG> mimelib.Message, which let users recursively get all parts of
    BG> a message, decoded into viewable text format -- even non-text
    BG> attachments will be replaced with a user-configurable message!

    BAW> I wonder if we couldn't generalize some of this into a
    BAW> "subpart walker", a la os.path.walk()?  I'm not going to do
    BAW> that now, but it's something to keep in mind for later.

Yes, eventually this should be generalized into something that will
simply decode even non-text attachments (and even replace them with
Image/etc objects? Hmm.) but I wanted to get something done now.

    BAW> Indeed!  Boy, I'm glad someone has the nerve to dive into the
    BAW> archiver code. :) It'll be way cool to eliminate the need for
    BAW> rfc822, except internally for some of mimelib's
    BAW> implementation, which will eventually go away.

Pipermail is not wholly bad, but it is extremely inefficient.  Since
so many people are never going to use anything but the built-in
archiver, I'd like to make it pretty solid, including behaving well on
huge lists.

    BAW> Awesome!  I hope you don't mind that I changed this message
    BAW> just a little bit:

    BAW>     [Non-text (%(type)s) part of message omitted, filename
    BAW> %(filename)s]\n

    BAW> Also, is the trailing newline necessary?

Looks fine!  I expected lots of things to change.

The trailing newline is an issue I wanted to address; currently,
multiple text parts just show up one after another (if there was a
newline between them in the message, they have newlines between them).

I'd be happier doing something like how Gnus shows multiple attachments
in a message:

(snip here)

This is the first, text part of the message.

[3. Interesting test --- text/plain; test]...

This is the (second? third?) text part of the message after the first
text attachment.

[5. Interesting pic --- image/gif; gm-icon00.gif]...

This is the (third? fourth?) text part of the message after the GIF
image.

(snip here)

Now, we're already replacing non-text attachments with a message,
but it'd be nice to announce text-attachments too, so things don't
get confusing.

Also, I forgot to test what happens with text/html parts.  Will they
get HTML escaped or will they mess up the document?

    BAW> I'm going to simplify some of the implementations when I
    BAW> check them in, and I may also change the method names,
    BAW> although perhaps I should keep yours for `backwards'
    BAW> compatibility?

    BAW> Side note: the naming scheme in mimelib.Message is getting
    BAW> both inconsistent and clumsy.  I intend to rectify this when
    BAW> I merge it into Py2.2.  Question: is backwards compatibility
    BAW> with mimelib 0.x important?

Please, please, go ahead and change all the names!  I actually agree
that mimelib.Message is a big old mess, and the get_foo_bar vs.
getfoobar functions kind of got on my nerves. :) I have no attachments
whatsoever to the current names, but I think it would be the right
thing to do to at least make the old names call the new ones (just
segregate them in the documents as a list of compatibility interface
names).

    BAW> I'm also adding a getboundary() since I tend to use that a
    BAW> lot!

Very useful.

    BAW>     | decode_body(self): | Returns a string of the
    BAW> non-multipart message's body, decoded.

    BAW> Here's where I've hit a conundrum.  What's the difference
    BAW> between "body" and "payload"?  To me, the body contains the
    BAW> entire flattened contents of the outer message, while the
    BAW> payload contains just first level down from the outer
    BAW> message.  I.e. it is definitely possible to have nested
    BAW> multiparts, e.g. multipart/mixed which contains some stuff
    BAW> including a multipart/digest -- think Mailman's MIME digests!

I guess my concept was the reverse; since a "payload" from
mimelib.Message can be either a list of payloads OR a single message,
my brain called the former a "payload" and the second a "body".  But
I still don't really know the right terminology..

We should probably come up with some clearer nomenclature, because
the current one really confused me. :)

Basically, we need a distinction between functions which take a 
(possibly) multipart message, and ones that only take a single
part message.

Rereading RFC 2045, I see that I subconsciously got my naming scheme
from a past reading of it.  Here's what RFC 2045 has to say:

[begin quote]

2.3. Message 

The term "message", when not further qualified, means either a
(complete or "top-level") RFC 822 message being transferred on a
network, or a message encapsulated in a body of type "message/rfc822"
or "message/partial".

2.4. Entity 

The term "entity", refers specifically to the MIME-defined header
fields and contents of either a message or one of the parts in the
body of a multipart entity. The specification of such entities is the
essence of MIME. Since the contents of an entity are often called the
"body", it makes sense to speak about the body of an entity. Any sort
of field may be present in the header of an entity, but only those
fields whose names begin with "content-" actually have any
MIME-related meaning. Note that this does NOT imply thay they have no
meaning at all -- an entity that is also a message has non-MIME header
fields whose meanings are defined by RFC 822.

2.5. Body Part 

The term "body part" refers to an entity inside of a multipart entity. 

2.6. Body 

The term "body", when not further qualified, means the body of an
entity, that is, the body of either a message or of a body part.

[end quote]

So, where you say "payload", RFC 2045 says "body", and where you
say "body", the RFC says "message".  How confusing! :)

I would suggest that we should go with the RFC's naming scheme, and
just make up a new term, something like single_body or single_entity,
to refer to MIME parts that themselves are not multipart.

So, how about these names.  I'm following the RFC here, using 'single
entity' to mean a non-multipart payload, 'message' to refer to a full,
standalone MIME message, and 'body' to refer to the body of a message.

decode_body -> decodeSingleEntity
get_decoded_payload -> getMessageAsText
get_text_payload -> getBodyAsText

Do they all have to start with 'get'?  I guess that's a matter of
taste.

    BAW> Comments?  I will likely check something in tonight, although
    BAW> I'll need to add unittest cases and documentation.

I hope this is helpful. :) Let me know what you think; I like
'single entity' to refer to a non-multipart part of a message,
'message' to refer to the whole thing, and 'body' to refer to
the (possibly multipart) body of a message.

Ben

-- 
Brought to you by the letters K and B and the number 7.
"Wuzzle means to mix."
Debian GNU/Linux maintainer of Gimp and GTK+ -- http://www.debian.org/