[Mailman-Users] Header Cleanup Script

Scott Courtney courtney at 4th.com
Sun Jun 2 11:09:50 CEST 2002


Well, it turns out there was so much cruft in that data from YahooGroups that
it was easier to write an awk script to zap most of it. Here's the script:

**************** BEGIN LISTING **************
#!/usr/bin/awk -f
#
# Attempts to clean up some ugly header problems when importing
# mail from YahooGroups to mbox format.
#
# Author: Scott Courtney <courtney at 4th.com>
#
# License: GPL
#
# Disclaimer: Written for my own one-time use; NOT thoroughly tested.
#
BEGIN {
        hdr=0;
}
/^From .*@.* .*:..:.. / {
        hdr=1;
        print $0;
}
/^$/ {
        hdr=0;
        print $0;
}
/^[A-Za-z0-9-]+: / {
        print $0;
}
! /^[A-Za-z0-9-]+: / && ! /^From .*@.* .*:..:.. / {
        if (hdr) {
                print " " $0;
        } else {
                print $0;
        }
}
********************* END LISTING ****************888

Another change that may or may not apply to your lists: Some versions of KMail,
the client that comes with KDE, produce a header called "Message-Id:". The
parser in "arch" requires this to be "Message-ID:" or it chokes. I didn't
put that into my awk script because it may not apply everywhere, and fixing
it is just a matter of :%s/^Message-Id/Message-ID/ in vi, or equivalent.

Hope this is helpful.

By the way, I apologize for posting so much today. Several people have been
in touch with me off-list indicating that I'm not the only one struggling
with these problems.

The good news: After running this new awk script, I'm able to import much
larger archives in a single chunk. The 80-message limit was highly
repeatable for me, and I still don't know why, but it's not hard-wired as
I had thought. Maybe just coincidental because all my data is so
homogeneous.

Good luck, everyone. I'm now up and running with four live lists. I hope
this documentation of the hurdles I've encountered will help the next
person in line to not have so many dents in his or her forehead. :-)

To bed, now, at last. :-)

Scott

-- 
-----------------------+------------------------------------------------------
Scott Courtney         | "I don't mind Microsoft making money. I mind them
courtney at 4th.com       | having a bad operating system."    -- Linus Torvalds
http://www.4th.com/    | ("The Rebel Code," NY Times, 21 February 1999)






More information about the Mailman-Users mailing list