[Mailman-Developers] "@" in mail text gets replaced inarchives

Mon Sep 29 19:29:54 EDT 2003

[John A. Martin]

>>>>>> "Harald" == Harald Meland
>
>     Harald> It is not clear to me that Mailman *is* an MTA.  It is not
>     Harald> an SMTP server, and is not (necessarily) an SMTP client.
>
> To have been precise perhaps I should have said something like "a mail
> agent must not muck with an existing Message-Id except as specified by
> the applicable standards".  The Applicable Standards, to quote for
> example rfc2822,, apply as follows:
>
>    This standard specifies a syntax for text messages that are sent
>    between computer users, within the framework of "electronic mail"
>    messages.

I agree that it is obvious that Mailman should strive to avoid sending
non-RFC2822-compliant messages.

However, I would think that the issue at hand is not about message
*syntax*, but rather about the *semantic* value of a message's
Message-Id.

Now that that nit is off my chest :-), I'll be quick to agree that RFC
2822 surely do contain a fair bit of semantic specifications as well;
more on that below.

> The applicable standards govern what goes on the wire and therefore
> what Mailman causes to be put on the wire through a MTA should be
> compliant.

Mailman is sort of between a rock and a hard place here, as it
occupies a double role:

 * Mailman should be liberal in what it accepts -- which seems to
   imply that it should accept incoming messages even if they do not
   not conform strictly to all aspects of RFC 2822.

   As one example, Mailman shouldn't offhandedly reject an incoming
   message just because there is a slight address syntax error in the
   message's From: header.

 * At the same time, Mailman should be conservative in what it sends.
   Naively, this would mean that Mailman ought to ensure that any
   message it puts on the wire conforms with RFC 2822; however, that
   would then have to either clash with the "liberal in what you
   expect" idea, or with the "don't change the message" maxim.

>     Harald> However, even if Mailman isn't an MTA, it would be nice if
>     Harald> it *mostly* tries to follow the MTA rules.
>
>     Harald> (As a side note, I am unable to find *clear* references to
>     Harald> the effect of your statement in RFCs 2821 or 2822.)
>
> Rfc2822 Section 3.6.4 (the first paragraph below is the same paragraph
> you quoted elsewhere)
>
>    [[ ... ]]
>
>    The "Message-ID:" field provides a unique message identifier that
>    refers to a particular version of a particular message.  The
>    uniqueness of the message identifier is guaranteed by the host that
>    generates it (see below).  This message identifier is intended to be
>    machine readable and not necessarily meaningful to humans.  A message
>    identifier pertains to exactly one instantiation of a particular
>    message; subsequent revisions to the message each receive new message
>    identifiers.
>
>    Note: There are many instances when messages are "changed", but those
>    changes do not constitute a new instantiation of that message, and
>    therefore the message would not get a new message identifier.  For
>    example, when messages are introduced into the transport system, they
>    are often prepended with additional header fields such as trace
>    fields (described in section 3.6.7) and resent fields (described in
>    section 3.6.6).  The addition of such header fields does not change
>    the identity of the message and therefore the original "Message-ID:"
>    field is retained.  In all cases, it is the meaning that the sender
>    of the message wishes to convey (i.e., whether this is the same
>    message or a different message) that determines whether or not the
>    "Message-ID:" field changes, not any particular syntactic difference
>    that appears (or does not appear) in the message.
>
> Rfc822 Section 4.6.1 (in its entirety):
>
>              This field contains a unique identifier  (the  local-part
>         address  unit)  which  refers to THIS version of THIS message.
>         The uniqueness of the message identifier is guaranteed by  the
>         host  which  generates  it.  This identifier is intended to be
>         machine readable and not necessarily meaningful to humans.   A
>         message  identifier pertains to exactly one instantiation of a
>         particular message; subsequent revisions to the message should
>         each receive new message identifiers.
>
> Rfc2822 in this case merely codifies long established practice
> interpreting rfc822.  Rfc2822 Appendix A.3 may be helpful for the
> present discussion.

The part that (still) isn't clear to me, is whether Mailman's action
of putting the message back on the wire can be said to be either 1)
generation of a new message (personally, I wouldn't think so) or 2) a
new instantiation of the message.

> To test for compliance with the rfc2822 determination "whether this is
> the same message or a different message" one might stipulate that if
> the PGP signature verifies it is the same message, if the PGP
> signature does not verify it is a different message.

Now we're deeply into message semantics. :-)

I'd like to point out to things about your argument:

Firstly, the RFC does not merely distinguish between "the same message
or a different message"; it also allows Message-ID: to be changed
whenever there is a new instantiation of a (single) message.

Secondly, having to resort to (your) *interpretation* of the RFC, by
using verification of PGP signatures for the test, is in my book a
clear indication that the RFC is *not* crystal clear on this issue.

> (One certainly can see by inspection what would break a signature
> without actually verifying the signature, right?)

That is my (rather shallow, I'm afraid) understanding of PGP email
signatures, yes.

>     Harald> Um.  Mailman lists have numerous configuration options for
>     Harald> changing messages (e.g. adding footers) before they are
>     Harald> sent to the list members, and it has had such options
>     Harald> since time immemorial.
>
> Who reads the RFCs to say that footers cannot be added without
> changing the message?

The more interesting issue, I think, is where should the line be
drawn; how much is Mailman allowed to change (various parts of) a
message before it should be considered a new message?

And, how does the Mailman modus operandi fit in with the RFCs "new
instantiation" use of words?

>     Harald> * To my mind it would not be obviously wrong to view
>     Harald>    Mailman as the *generator* of messages, at the very
>     Harald>    least in the cases where it is obvious that the
>     Harald>    previous generator didn't do its job of guaranteeing
>     Harald>    message-id uniqueness properly.
>
> Why?

Given that there exists two (or more) distinct messages that share the
same message-id, the uniqueness of this identifier (as proscribed by
RFC 2822) is clearly not satisfied.  Hence, if Mailman really wants to
have the messages it puts on the wire conform with RFC 2822, it should
take on the role of message generator, and issue distinct message-ids
for such distinct messages.

The hard problem, of course, is to properly discover whether or not
two messages are indeed distinct; they might differ slightly by
e.g. an automatically added footer, or in some other minor, but
programmatically hard to discover, fashion.

> ISTM the problem you are trying to solve is how to identify the
> archive image of the message.
>
> Why not construct a URL containing a scrubbed Message-Id (as Brad
> Knowles has indicated) and a serial number (as I have indicated)?

Because, as Barry said, that would mean the "archive image identity"
of the messages could change whenever the archive needs to be rebuilt
(e.g. after a disk crash, the archives are gone, and there are no
backups; then some kind list member comes forward with a partial
archive constructed from the messages they've received from the list).

> Such a URL should go into the "List-Archive" header field pointing to
> the specific message without doing violence to rfc2369 Section 3.6,
> right?

I don't think that's too far from the intention of that header, no.
That section seems rather loosely worded, something I hope was done
intentionally:

  3.6. List-Archive

     The List-Archive field describes how to access archives for the list.

     Examples:

       List-Archive: <mailto:archive at host.com?subject=index%20list>
       List-Archive: <ftp://ftp.host.com/pub/list/archive/>
       List-Archive: <http://www.host.com/list/archive/> (Web Archive)

-- 
Harald

[Mailman-Developers] "@" in mail **text** gets replaced inarchives

[Mailman-Developers] "@" in mail text gets replaced inarchives