[Mailman-Developers] Regarding Handlers/SMTPDirect.py and "chunkify"

Stefan Förster cite at incertum.net
Mon May 12 23:59:06 CEST 2008


Am 12.05.2008 um 23:20 schrieb Mark Sapiro:
> I understand what you are saying, but I wonder what the real world
> difference would be. As currently written, chunkify returns at most 4
> partially filled chunks. Granted, 4 is significantly bigger than one,
> but given that the MTA is VERPing the deliveries, it may ultimately
> create an outgoing queue entry for each recipient anyway, so the extra
> 3 on the inbound side doesn't seem that significant (and it might
> increase parallelism in the MTA).

First of all, I just noticed that the official code does indeed only  
create at most 4 partially filled buckets. That's the problem when you  
have to jump in for someone else: My SMTPDirect.py contains 26 TLDs.  
Two thoughts:

1. Even with only four buckets, when we have a real world distribution  
amongst recipient addresses, this is four times the I/O needed. The  
ratio get's better with the number of list subscribers growing, but if  
there are less recipients than SMTP_MAX_RCPTS, it's exactly at 1:4.
2. Why even split recipients the way it's done now at all? You have to  
either add new buckets (add new TLDs) or have all recipients outside  
the hard coded TLDs be thrown into the same bucket. I could understand  
it if you first created a list of TLDs involved and sorted by those -  
though I don't know if it's a good idea if you run a really large list  
and examine all recipients...

I didn't understand what you said about VERPing and outgoing queue  
entries - surely any MTA will keep track of recipients on a per  
message basis? As for parallelism, I think the best way to ensure fast  
delivery is to make all target destinations known to the MTA as fast  
as possible.

> Given your 25000 member list, and assuming SMTP_MAX_RCPTS = 500, you
> would have at most 54 chunks (and more likely 53 or 52) instead of 50.
>
> In any case, If I were coding this, I would be inclined to not make it
> an option, but just to change chunkify so it still grouped, but
> continued to fill the last chunk of a group from the next group so
> there would be at most one partial chunk.

At the moment, I changed the code to simply return SMTP_MAX_RCPTS per  
chunk - or all recipients if there are less than that. Hardcoded, not  
configurable. The way it is done now I can't see any real advantages -  
especially living outside the U.S. Either improve the sorting  
algorithm (all TLDs, don't return partial chunks) or make it  
configurable to skip sorting altogether. Or at least that's what I  
feel would be an improvement. Have it default to flat chunking. It  
saves CPU time, I/O operations and gives the MTAs queue manager more  
time to do it's job.


Cheers
Stefan
-- 
Stefan Förster     http://www.incertum.net/     Public Key: 0xBBE2A9E9
Written on OSX. Who ate my ~/.signature?



More information about the Mailman-Developers mailing list