Mailergate (was: python docs search for 'print')

Thomas 'PointedEars' Lahn PointedEars at web.de
Wed Sep 5 14:42:18 EDT 2012


Stephen D'Aprano wrote:

> On Tue, 04 Sep 2012 20:27:38 +0200, Thomas 'PointedEars' Lahn wrote:
>> ¹  The other mess they created (or allowed to be created) is this mashup
>>    of newsgroup and mailing list, neither of which works properly,
> 
> In what way do they not work properly?

Most prominently, threads are completely and utterly borken.

>>    because the underlying protocols are not compatible.
> 
> What?
> 
> That is rather like saying that you can't read email via a web interface
> because the http protocol is not compatible with the smtp protocol.

Apples and oranges.  The problem is gating messages from a mail server to a 
news server and vice-versa without regard to the differences between the 
underlying protocols.

Netnews User Agents (NUAs, newsreaders), are currently based on [RFC3977] 
and [RFC5536].

In a Netnews article, a References header field is mandatory for a posting 
that is a follow-up.  (Threading by Subject and Date works poorly, if at 
all, so the Specification does not suggest that.)  The last element of the 
References header field value has to be a Message-ID specifiying the 
article's precursor.  That Message-ID has to match the Message-ID header 
field value of an existing posting, unless it has expired on the target 
newsserver or was canceled (with Supersedes being a special case).  The
In-Reply-To header field (see below) is not allowed there, but it is set by 
some hybrid MUA/NUAs like Mozilla Thunderbird anyway¹.

Mail User Agents (MUAs, mailreaders), on the other hand, are currently based 
on [RFC5321], [RFC1939], IMAP4 (various RFCs, starting with [RFC1730]), and 
last but not least [RFC5322].

There are two possible header fields to build a thread of e-mail messages: 
In-Reply-To, and References.  Whereas the first header field's value is 
supposed to be a Message-ID and the second one's as described in [RFC5536].  
Few MUAs set both, some set the first one, and many set none of them at all, 
because there is no absolute requirement to set any of them (see [RFC5322], 
section 3.6.4.)

And then there is utterly borken software – or shall we say utterly borken 
approaches?  Consider for example the recent thread with Subject "simple 
client data base" started by Mark R Rivet.  The original posting has:

| User-Agent: ForteAgent/7.00.32.1200

(posted using a newsreader)

| […]
| Message-ID: <lae9489ct99mp704um93sdqlatofb2i8gq at 4ax.com>

Chris Angelico's follow-up to that has

| In-Reply-To: <lae9489ct99mp704um93sdqlatofb2i8gq at 4ax.com>
| References: <lae9489ct99mp704um93sdqlatofb2i8gq at 4ax.com>
| […]
| Message-ID: <mailman.142.1346682533.27098.python-list at python.org>
| […]
| X-Mailman-Version: 2.1.15

(apparently posted using a mailreader, gated by python.org's mail software)

So far, so good.  But Peter Otten's follow-up to Chris Angelico's posting 
has

| References: <lae9489ct99mp704um93sdqlatofb2i8gq at 4ax.com>
|  <CAPTjJmpHPE=SdE_XJtdi4DMFVeWa8Exo3Arsu13Hd8fgSuZ5bw at mail.gmail.com>
| […]
| User-Agent: KNode/4.7.3	

(posted using a newsreader)

| […]
| Message-ID: <mailman.145.1346683813.27098.python-list at python.org>

As you can see, the Message-ID of Chris' posting does not occur in the 
References header field value of Peter's posting, which is caused by 
python.org's SMTP-to-NNTP gating program to set its own Message-ID, ignoring 
the Message-ID of the server where the message was injected.  Therefore, 
although it is a followup to Chris' posting, Peter's posting has no 
*technical* (metadata) relation to Chris' posting.

Instead, it should have

| References: <lae9489ct99mp704um93sdqlatofb2i8gq at 4ax.com>
|  <mailman.142.1346682533.27098.python-list at python.org>
| […]

or, better: Chris' posting should have had the original

| […]
| Message-ID: 
|   <CAPTjJmpHPE=SdE_XJtdi4DMFVeWa8Exo3Arsu13Hd8fgSuZ5bw at mail.gmail.com>
| […]

(no word-wrap), then the header fields of Peter's posting can stay as they 
are.

My newsreader (KNode/4.4.11) tries its best to resolve this (short of 
threading by Subject and Date, which does not work; see above) which causes 
Peter's posting to end up as a follow-up to *Mark's* posting instead 
(specified by the only valid Message-ID in the References header).  Only 
when you read Peter's posting you realize that it is not a follow-up to 
Mark's at all.  Confusion ensues.

There are a lot of similar examples here.  As a result of the Message-ID 
rewriting, in several cases a follow-up even appears as if it was an 
original posting, without any technical (and therefore without any obvious 
visual) relation to the thread it actually belongs to at all, even though 
the precursor has not expired.  For example,

| […]
| X-Original-To: python-list at python.org
| Delivered-To: python-list at mail.python.org
| […]
| In-Reply-To: <50464153.5090402 at gmail.com>
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| References: <50464153.5090402 at gmail.com>
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| Date: Tue, 4 Sep 2012 14:27:35 -0400
| Subject: Re: python docs search for 'print'
| From: Joel Goldstick <joel.goldstick at gmail.com>
| To: David Hoese <dhoese at gmail.com>
| Content-Type: text/plain; charset=UTF-8
| Cc: python-list at python.org
| […]
| Newsgroups: comp.lang.python
| Message-ID: <mailman.185.1346783257.27098.python-list at python.org>
| […]
| 
| On Tue, Sep 4, 2012 at 1:58 PM, David Hoese <dhoese at gmail.com> wrote:
| > […]

There is no message with Message-ID <50464153.5090402 at gmail.com> (at least 
not on the newsserver that I use), because that header field value was 
overwritten by the borken gating software that python.org uses.  The actual 
message posted by that software is:

| […]
| X-Original-To: python-list at python.org
| Delivered-To: python-list at mail.python.org
| […]
| Date: Tue, 04 Sep 2012 13:58:43 -0400
| From: David Hoese <dhoese at gmail.com>
| User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7;
| 	rv:15.0) Gecko/20120824 Thunderbird/15.0
| […]
| To: python-list at python.org
| Subject: python docs search for 'print'
| […]
| Newsgroups: comp.lang.python
| Message-ID: <mailman.184.1346781550.27098.python-list at python.org>
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

To further show that this is not a coincidence, and that I am not imagining 
things here, the same problems started to occur when some people of the 
German-speaking Python mailing list at python.org thought it would be a good 
idea to merge that mailing list and the German-speaking newsgroup 
de.comp.lang.python not so long ago, using the same software.  As a result, 
that Python newsgroup is a complete mess now, too.

>>    Add to that the abomination that Google Groups has become.
> 
> It's always been an abomination,

After they took over the Dejanews archive it was rather OK.  You could use 
it with the keyboard, lines were at least automatically wrapped at 80 
columns (but unfortunately, only when sending and there was no preview 
[AFAIK it still isn't]), they removed postings reported as spam, and so 
forth.

> although I understand it is much, much worse now.

Now you cannot even use it with the keyboard, the postings are not properly 
word-wrapped when typing or submitting (resulting in lines of 200 characters 
and more).  The spam is not removed at all, but only hidden from *Google* 
*Groups* users, which causes it to be distributed on Usenet unchecked unless 
the closest peers of the Google Groups servers happen to employ a suitable 
spam filter, or have at least one dedicated user who runs a killbot.

> Blame Google for that.

I do, and I have UDP'd Google Groups since April for that (except follow-ups 
to my postings).  However, I am also blaming the people still using it 
without complaining sufficiently, because if they would not use it or would 
complain more often and louder, Google would have to fix it.  Unfortunately, 
most people do not even know where they are posting to when they access 
Usenet via Google Groups, so there is little hope for improvement of the 
situation.

But that is another can of worms entirely.

__________
¹  Recent example: <news:k23c3l$ldn$1 at news.albasani.net>

References:

[RFC1730] Crispin, M. "INTERNET MESSAGE ACCESS PROTOCOL - VERSION 4"
          (IMAP4). December 1994.  <http://tools.ietf.org/html/rfc1730>
[RFC1939] Myers, J. and Rose, M. "Post Office Protocol - Version 3".
          May 1996.  <http://tools.ietf.org/html/rfc1939>
[RFC3977] Feather, C. "Network News Transfer Protocol (NNTP)".
          October 2006.  <http://tools.ietf.org/html/rfc3977>
[RFC5321] Klensin, J. "Simple Mail Transfer Protocol" (SMTP).
          October 2008.  <http://tools.ietf.org/html/rfc5321>
[RFC5322] Resnick, P. (ed.) "Internet Message Format".
          October 2008.  <http://tools.ietf.org/html/rfc5322>
[RFC5536] Murchison, K., Lindsey, C., and Kohn, D.
          "Netnews Article Format". November 2009.
          <http://tools.ietf.org/html/rfc5536>
-- 
PointedEars

Twitter: @PointedEars2
Please do not Cc: me. / Bitte keine Kopien per E-Mail.



More information about the Python-list mailing list