[Mailman-Developers] Listing Lists Faster in 2.0?

Ted Cabeen secabeen@pobox.com
Tue, 04 Apr 2000 09:46:43 -0500


In message <38E9F8D2.10F251D5@uchicago.edu>, Roberto Ullfig writes:
>Roberto Ullfig wrote:
>> 
>> "Barry A. Warsaw" wrote:
>> >
>> > >>>>> "RU" == Roberto Ullfig <rullfig@uchicago.edu> writes:
>> >
>> >     RU> So, in 1.0rc2, displaying the list of lists for 529 lists
>> >     RU> requires 529**2 = 279841 system stat calls and takes over one
>> >     RU> and a half minutes on our Ultra-2 2x296 processor system! Is
>> >     RU> this because of Python, Mailman, or both? Has this been
>> >     RU> "fixed" in 2.0? You really should only need to make one stat
>> >     RU> call per list.
>> >
>> > Uh, it's because of Mailman :)
>> >
>> > I implemented a list_lists scripts which does on the command line what
>> > listinfo.py does in HTML (see attached).  Here's what truss -c gives
>> > me:

<Snip big truss output>

>> > Getting the list of list names, requires at least a listdir() and an
>> > exists() for every directory found there.
>> >
>> > Nothing about this will change for 2.0.
>> >
>> > -Barry
>> 
>> Thanks for the script.
>> 
>> Now this is the truss output for the listinfo that is called by
>> driver:

<Snip more truss output>

Here's what is happening.  When listinfo runs and has to get the list of 
advertised addresses, it starts by getting a list of mailing lists on the 
server using Utils.list_names().  Utils.list_names() requires two stat calls 
for each list on the machine every time it is called.  This is 
understandable and isn't going to change.  It then proceeds to open every one 
of those lists to check on the advertised flag.  Again, no problem.  The 
problem is that when Mailman opens a List in the MailList constructor 
__init__, it checks to makes sure that the list exists by running 
Utils.list_names() and seeing if the list name requested is there.  Therefore 
every request to open a list requires two stat calls on every list on the 
system.  Therefore when we are sequentially opening every list on the system 
in listinfo.py we get a squaring effect on stat calls in the list directory.  

Solutions are reasonably easy to code, the first of which comes to mind is a 
optional argument to the constructor that indicated that the name has already 
been checked and that checking it again is not necessary.  Other solutions 
include caching the list of lists on the server, but this means there is a 
delay between when the list is created and when it becomes accessible.  

I can code something up for you if necessary, but it seems like a reasonably 
simple patch.  Do either you or Barry need a patch?  Let me know if you do.

--
Ted Cabeen           http://www.pobox.com/~secabeen         secabeen@pobox.com
Check Website or finger for PGP Public Key        secabeen@midway.uchicago.edu
"I have taken all knowledge to be my province." -F. Bacon   cococabeen@aol.com
"Human kind cannot bear very much reality."-T.S.Eliot 73126.626@compuserve.com