[Mailman-Users] big lists, big messages

Sun May 13 07:43:14 CEST 2001

On Sat, 12 May 2001 19:20:21 -0700 (PDT) 
tib  <tib at tigerknight.org> wrote:

> To take another approach, mail out a link to the newsletter rather
> than the ENTIRE newsletter to each person. Do the math; if you're
> mailing out a letter that's 30k, to 10,000 users. that's gonna be
> 300 megs of data that's getting pumped through your system...

Your math is off as it ignores RCPT-TO envelope size.

> ... on a weekly basis, with each one having a possiblity of
> becoming corrupt or having other failures in transfer.

1) If your messages are getting corrupted, AT ALL, you have far more
serious problems than how fast your system is able to deliver a list
broadcast.  Something is fundamentally broken and that needs to be
fixed, now, before you start worrying about much else.

2) Transfer failures given a good MTA and reasonable choice of RCPT
TO bundle size should cause minimal problem in delivery rates for
the list broadcast.  Empirical testing here, for my admittedly very
atypical membership/domain distribution suggests that between 5 and
25 is my sweet spot under Postfix.  Chuq IIRC has found for his
locad under Sendmail that somewhere in the 30 range is his sweet
pot.  Vour mileage will vary.

3) If delivery failures are clogging your MTA queue and are
noticably slowing delivery rates, you need to start thinking about
reviewing your MTA configuration or using a different and more
intelligent MTA.

> What about turning the newsletter into a webarticle that you post
> on the net somewhere and send out just the link to all those 10k
> subscribers. First of all it'll cut your data output through the
> mailserver IMMENSELY, a 1k message that goes out rather than 30k
> will only end up being 10 megs rather than 300. 

Actually, list servers are generically disk IO bound with the
primary factor in the disk IO being open/close/unlink time not
read/write time.

> Second, the time it takes to SEND that batch of messages will be
> drasticly reduced. And last, if you have to make any changes to
> the message or find a critical editing error AFTER it's out, you
> can correct it in one place (the single web page) rather than
> having to mail out an error correction message to those 10k people
> all over again.

This assumes of course that the audience has web access, and in
particular has web access at the time and on the device they would
normally read the messages.  

  Example : It wouldn't work for me reading on the train on my
  laptop.

> Downside? 

You cut off a potentially significant percentage of your audience.

You make third part archiving and disconnected analysis and review
of your material more difficult.

You make automated processing of your content (ie email driven
automation systems) much more difficult.

You turn a low bandwidth, disconnected, permanent operation (to the
user, the mail arrives asynchronously with his other operations),
into a high bandwidth connected operation which is explicitly
transitory (close the browser and its gone).

There are reasons I don't subscribe to lists as you describe above.

> I don't really see any big ones. 

I subscribe to something over 130 lists at this point.  I don't read
them all, and I don't even try and keep up with many of them.
However, I subscribe to them as I'm interested in something about
them, so I *do* want to participate to at least some degree.

So I automate and reduce the problem.

  Procmail appropirate files all list messages into per-list
  folders.  As it files each message it also compares each message
  against a list of key words for hat list, and if the message
  matches, it also files a copy of the message in a folder called
  <listname>-interesting.

  I then read <listname>-interesting.

  Should I reply to a message on the list, or send a message to the
  list (from <listname>-interesting or wherever), procmail will
  notice when it receives that message and goes to file in the
  appropriate folders.  Noticing that it is from me, it logs the
  MessageID into a DB.

  Any future messages for that list are also checked for that
  MessageID in their References: and In-Reply-To: headers, and if
  there's a match, that message's MessageID is dropped into the DB
  and a copy of it is dropped into -interesting.

Result?

  I get to read what I'm interested in on the list, and any time I
  post to the list, I get to see everything on that thread until it
  dies.  Meanwhile the rest of the list passes me silently by.  I
  can of course go read the main list folder any time I want, which
  I do periodically to update the key word lists -- but usually its
  enough to just read -interesting.

That sort of autmation would be simply impossible with the web-based
distribution you describe.

-- 
J C Lawrence                                       claw at kanga.nu
---------(*)                          http://www.kanga.nu/~claw/
The pressure to survive and rhetoric may make strange bedfellows