[Mailman-Developers] chunkify suggestion, with patch.

Darrell Fuhriman darrell@grumblesmurf.net
Mon, 17 Jul 2000 13:59:26 -0700 (PDT)


> Also, one problem with this code is that the domain_sort fucntion is
> really rather slow.  If anyone has suggestions for speeding it up, that'd
> be great.  :)

OK, I looked at bulk_mailer and realized it has a much
faster/simpler/slicker way of doing this work. So I stole the idea.  :)

Here's a new patch for SMTPDirect.py.  Who decides what get included in
the release, BTW?

Darrell


--- Mailman/Handlers/SMTPDirect.py~	Fri Jun  2 21:59:45 2000
+++ Mailman/Handlers/SMTPDirect.py	Mon Jul 17 13:50:16 2000
@@ -106,48 +106,48 @@
 
 
 def chunkify(recips, chunksize):
-    # First do a simple sort on top level domain.  It probably doesn't buy us
-    # much to try to sort on MX record -- that's the MTA's job.  We're just
-    # trying to avoid getting a max recips error.  Split the chunks along
-    # these lines (as suggested originally by Chuq Von Rospach and slightly
-    # elaborated by BAW).
-    chunkmap = {'com': 1,
-                'net': 2,
-                'org': 2,
-                'edu': 3,
-                'us' : 3,
-                'ca' : 3,
-                }
-    buckets = {}
-    for r in recips:
-        tld = None
-        i = string.rfind(r, '.')
-        if i >= 0:
-            tld = r[i+1:]
-        bin = chunkmap.get(tld, 0)
-        bucket = buckets.get(bin, [])
-        bucket.append(r)
-        buckets[bin] = bucket
+    # If we turn down the chunksize (i.e. SMTP_MAX_RCPTS), and have
+    # the addresses sorted by domain, it's much nicer to the MTA and
+    # to the users.  (In the majordomo world, this is what bulk_mailer
+    # would do.)
+    # In an ideal world, a single domain wouldn't be split across
+    # multiple chunks unless a someother threshold had been met.
+    # I'll save that for sometime when it's not 2:30am.  :)
+
+    recips = domain_sort(recips)
+    
     # Now start filling the chunks
     chunks = []
     currentchunk = []
-    chunklen = 0
-    for bin in buckets.values():
-        for r in bin:
-            currentchunk.append(r)
-            chunklen = chunklen + 1
-            if chunklen >= chunksize:
-                chunks.append(currentchunk)
-                currentchunk = []
-                chunklen = 0
-        if currentchunk:
+    for recip in recips:
+        if len(currentchunk) >= chunksize:
             chunks.append(currentchunk)
             currentchunk = []
-            chunklen = 0
+        currentchunk.append(recip)
+    if len(currentchunk) != 0:
+        chunks.append(currentchunk)
     return chunks
+            
+
+def domain_sort(recips):
+    # first we need to reverse every element of the list
+    for i in range(0,len(recips)):
+        recips[i] = string_reverse(recips[i])
+
+    recips.sort()
 
+    for i in range(0,len(recips)):
+        recips[i] = string_reverse(recips[i])
+    return recips
 
 
+def string_reverse(str):
+    tmp = array.array('c', str)
+    tmp.reverse()
+    str = tmp.tostring()
+    return str
+
+
 def pre_deliver(envsender, msgtext, failures, chunkq):
     while 1:
         # Get the next recipient chunk, if there is one