[Mailman-Users] Importing large archives ... design limit hit, and possible bug

Scott Courtney courtney at 4th.com
Sun Jun 2 16:15:16 CEST 2002


On Sunday 02 June 2002 10:00 am, LuKreme wrote:
> On Saturday, June 1, 2002, at 09:21 PM, Scott Courtney wrote:
> > On Saturday 01 June 2002 10:59 pm, LuKreme wrote:
> >> Out of curiosity, how did you split the mbox?  I have about 1200 emails
> >> I
> >> want to add to the archive.
> >
> > I wrote a little "awk" program to split them into 80-message chunks. Here
> > is
> > the source code:
>
> Ah.. awk.  I hate awk.  Good thing you wrote it. :)

I love awk! You probably would use <gasp!> <ugh!> .... Perl.
(I mean that humorously, by the way. Not trying to start a flamewar or
anything. Linux has lots of good tools, and it's great that each of us
can choose the ones we like best. <GRIN>)

[...]
>
> All the emails got loaded (thanks!) but I'm still getting errors when it's
> trying to finish.
>
> ******
> Updating index files for archive [2002-June]
>    Date
>    Subject
>    Author
>    Thread
> Computing threaded index
> Updating HTML for article 52
> article file /Users/mailman/archives/private/list/2002-June/000052.html is
> missing!

My suggestion now is to do the following:

1. Fix up the "From " --> "rom " errors, since that is a known, obvious, and
   severe problem. You've probably already done that.
2. Read my later emails. I found a better way to deal with the archives at
   my end, namely by fixing the data so that "arch" doesn't fall out due to
   excessive errors. It appears that was the root cause of my problem -- bad
   input, and "arch" not having enough error diagnostics inside. Once I added
   some new error reports to "arch", I started getting answers that led me
   to the problem.
3. Consider using my *other* awk program, goodheaders.awk, to filter a copy
   of your data, then try the import as one single file. Steps for this:

       cd /users/mailman
       cp archives/private/mylist.mbox/mylist.mbox mylist.mbox.original
       ./goodheaders.awk < mylist.mbox.original > mylist.mbox.filtered
       cp mylist.mbox.filtered archives/private/mylist.mbox/mylist.mbox
       rm -r archives/private/mylist/*
       bin/arch mylist
       cron/nightly_gzip mylist

   Do these things with your qrunner and cron tasks temporarily halted.
   The "rm" command will zap all the old HTML files so you can rebuild
   from scratch. It also zaps the stateful information from previous
   runs.

This worked quite well for me. I'm now mostly done transferring my lists,
and the remainder is just mechanics, not troubleshooting. It was 0500 here
and I was ready to get some sleep. ;-)

Good luck!

Scott

-- 
-----------------------+------------------------------------------------------
Scott Courtney         | "I don't mind Microsoft making money. I mind them
courtney at 4th.com       | having a bad operating system."    -- Linus Torvalds
http://www.4th.com/    | ("The Rebel Code," NY Times, 21 February 1999)






More information about the Mailman-Users mailing list