[Mailman-Users] Download pipermail archives, convert to mbox file (script)

John Magolske listmail at b79.net
Wed Oct 17 21:45:04 CEST 2012


Hi,

Sometimes before subscribing to a list I like to download the archives
and convert them into an mbox file for nice threaded browsing &
searching using familiar tools (for me, Mutt and mairix). Not finding
an automated way to do this [1], I put together the following shell
script. Simple & rough, but seems to do the job:

  #!/bin/sh
  # automated retrieval of pipermail archives & conversion to mbox file
  # Last edit: 2012/10/09 Tue 23:16 PDT
  listname=$(echo "$1" | sed 's:^\(http.*\)/\([^/]*\)/$:\2:')
  cd /tmp
  wget -r -l 1 -nH -A *.txt.gz "$1"
  touch /tmp/pipermail/$listname/$listname.mbox
  chmod 600 /tmp/pipermail/$listname/$listname.mbox
  cd /tmp/pipermail/$listname
  for f in $(ls |sort)
  do zcat $f | iconv -f iso8859-15 -t utf-8 | sed 's/\(^From.*\)\ at\ /\1@/' >> "$listname.mbox"
  done
  rm /tmp/pipermail/$listname/*.gz
  mutt -f /tmp/pipermail/$listname/$listname.mbox

I call this script piperget, and by doing:

  piperget http://example.tld/pipermail/somelistname/

the file /tmp/pipermail/somelistname.mbox is created and opened by
mutt. If I like what I see, I move the mbox file to an appropriate
location in my Mail directory, subscribe to the list, and filter the
list traffic into that mbox.

This could be made more robust and tweaked to better suit varying
needs. Being able to specify a range of archive dates would be
nice. Another thought is to have the option of leaving the last few
*.txt.gz files laying around (somewhere other than in /tmp), checking
against them to only wget new archives or an archive with a newer
time-stamp, then concatenating newer messages onto the existing mbox.
A sort of a pseudo-subscription to a list. Repeatedly re-downloading
an entire monthly/quarterly archive as it changes would be rather
bandwidth-wasteful though, better to subscribe and update the *.mbox
via SMTP. Not sure if there's some rsync way to incrementally download
only the parts of an archive that've changed... Anyhow, mostly I just
use this to catch up on a list at the moment of deciding whether or
not to subscribe to it. Any thoughts or suggestions are welcome.

[1] After writing this script I did find:
    https://github.com/wesleyd/pipermail-archive-to-maildir 
    Which could be another option for those interested in the
    maildir format. I prefer mbox for mailing lists.

John


-- 
John Magolske
http://B79.net/contact


More information about the Mailman-Users mailing list