[Mailman-Users] mhonarc/glimpse/wilma tips and tools

Todd Pfaff pfaff at edge.cis.mcmaster.ca
Wed Apr 5 05:07:58 CEST 2000


i've been successfully using mailman with mhonarc, glimpse and wilma for
searchable archives for some time now.  i've been meaning to send to the
list the details of how i did this and some useful tools but i've been
short on time lately.

i'll try to give a quick overview of what i do here, and maybe i can flesh
this out into a more useful howto as i see what kinds of questions/feedback
i get from others.  i've probably left out lots of little details here
so i'll try to answer any specific questions as time permits.

first of all, i create the directory structure:

~mailman/local/
~mailman/local/bin/
~mailman/local/bin/archive-update
~mailman/local/bin/wilma-list-mboxify
~mailman/local/bin/wilma-list-regen-html
~mailman/local/bin/wilma-list-setup
~mailman/local/bin/wilma-list-update
~mailman/local/src/
~mailman/local/glimpse/
~mailman/local/mhonarc/
~mailman/local/wilma/
~mailman/local/wilma/.glimpse_exclude
~mailman/local/wilma/.glimpse_filters
~mailman/local/wilma/wilma_template.cf
~mailman/local/wilma/wilma_template.rc

where i put all the local bits related to our mailman installation.
just a matter of personal taste, i know, but it makes it easy to find what
i'm looking for and to move to a different host.  all of the local
mailman tools i've written (for mhonarc, glimpse, wilma etc) rely somewhat
on this directory structure so you'll likely have to modify them for your
site unless you follow the same layout.

i've attached a bunch of scripts and config files which live under
~mailman/local on my system.

i'm tested all of this using the following environment:

Solaris 7
mailman-1.1
MHonArc-2.4.4
  http://www.oac.uci.edu/indiv/ehood/mhonarc.html
glimpse-4.12.5
  http://glimpse.cs.arizona.edu/
wilma-1.4
  ftp://ftp.hpc.uh.edu/pub/wilma/

in addition, to use the mboxify script, you'll need the formail program
that is part of the procmail package.  i'm using formail from procmail-3.10
which is 1994 vintage.

i'm running all of this on two mailman installations, one of which is serving
about 100 quite active lists.  these installations both use different mailman
home directories - one is /usr/local/mailman, the other is /home/mailman -
so everything was written with transportability in mind.  the ~mailman/local
directory is identical on both sites.

please read all the scripts closely so you're aware of what they're doing
before you run them.

wilma-list-mboxify can be a little confusing.  it uses formail to split
an mbox file into individual messages and then it is called recursively
from formail to process each message.

if you're using wilma, diff my wilma_template.* against the originals so
that you're aware of the changes i've made.  pay particular attention to
wilma-list-setup and how it edits these wilma/glimpse template files based
on the list name and other per-list or per-site variables.

in mm_cfg.py you will eventually want something like this:

ARCHIVE_TO_MBOX = 2

PUBLIC_EXTERNAL_ARCHIVER = """
/home/mailman/local/bin/archive-update %(listname)s
"""
PRIVATE_EXTERNAL_ARCHIVER = """
/home/mailman/local/bin/archive-update %(listname)s
"""

while you are setting all this up and getting the mhonarc/glimpse/wilma
stuff working, i suggest you set:

ARCHIVE_TO_MBOX = 1

so that mailman continues to generate mbox archives but does not call
your external mhonarc archiver.  you can always use the various scripts
i've provided here to manually generate the html archives from the mbox
files.

then when you're sure it's all working, set this to 2 to have the html
archives generated at delivery time.

hope this helps at least get some other people started using mhonarc
with mailman.

--
Todd Pfaff                         \  Email: pfaff at mcmaster.ca
Computing and Information Services  \ Voice: (905) 525-9140 x22920
ABB 132                              \  FAX: (905) 528-3773
McMaster University                   \
Hamilton, Ontario, Canada  L8S 4M1     \
-------------- next part --------------
#!/bin/sh
#
# a wrapper script to be called from mailman during delivery
# as the EXTERNAL_ARCHIVER
#
# since mailman delivery runs as user daemon, group mailman
# we have to ensure here that the path and umask are set.
#
# we also have to make sure that everything in ~mailman/archives
# is group mailman writeable before this script is run.
# this required some changes to umask and modes in
# wilma_template.cf and wilma_reindex.
#
# Todd Pfaff
# pfaff at mcmaster.ca
# Feb 21 2000
#
list=$1
umask 002
HOME=`/bin/grep '^mailman' /etc/passwd | /bin/cut -d: -f6`
export HOME
PATH=/usr/ucb:/usr/sbin:/usr/bin
PATH=/usr/local/bin:$PATH
PATH=$HOME/bin:$PATH
PATH=$HOME/local/wilma:$PATH
PATH=$HOME/local/bin:$PATH
export PATH
# for debugging
logdir=$HOME/logs/`basename $0`
mkdir $logdir >/dev/null 2>&1
log=$logdir/$list
#log=/dev/null
echo archive-update start: `date` > $log 2>&1
wilma-list-update $list >> $log 2>&1
echo archive-update finish: `date` >> $log 2>&1
-------------- next part --------------
#!/bin/sh
#
# split a mailman list mbox file into
# separate mbox.YYMM files suitable for
# the Wilma/Glimpse/MHonArc external archiver
#
# Todd Pfaff
# pfaff at mcmaster.ca
# Feb 14 2000

list=$1
if [ "$list" = "" ]; then
  echo usage: $0 [-r] listname
  exit 1
fi

opt_r=0
if [ "$list" = "-r" ]; then
  list=$2
  if [ "$list" = "" ]; then
    echo usage: $0 [-r] listname
    exit 1
  fi
  opt_r=1
fi

opt_x=0
if [ "$list" = "-x" ]; then
  list=$2
  if [ "$list" = "" ]; then
    echo usage: $0 -x listname
    exit 1
  fi
  opt_x=1
fi

top=$HOME/archives/private/$list.mbox

if [ "$opt_x" -ne 1 ]; then
  if [ "$opt_r" -eq 1 ]; then
    echo "removing old monthly mbox.YYMM files"
    /bin/rm -f $HOME/archives/private/$list.mbox/mbox.*
  fi
  echo "splitting $list.mbox into monthly mbox.YYMM files"
  mbox=$top/$list.mbox
  if /bin/test ! -e $mbox; then
    echo "$mbox does not exist"
    exit 1
  fi
  mkdir $top/.old >/dev/null 2>&1
  mbox2=$top/.old/$list.mbox-`date +%Y%m%d-%H%M%S`-$$
  # beware! there is a small window of opportunity here
  # for mbox to become split in the middle of a delivery
  mv $mbox $mbox2
  gzip $mbox2
  gzcat $mbox2 | \
  formail -s $0 -x $list
  exit 0
fi

tmp=/tmp/`basename $0`-`date +'%Y%m%d'`-$$
cat > $tmp

# select the appropriate lines depending on the date format in the
# 'From ' fields in your mbox files

# From xxx at yyy.zzz Fri Jan  7 01:02:03 2000
year=`cat $tmp | formail -x 'From ' | awk '{print substr($6,3,2)}'`
month=`cat $tmp | formail -x 'From ' | awk '{print $3}'`

# From xxx at yyy.zzz Fri, 7 Jan 2000 01:02:03 -0500 (EST)
#year=`cat $tmp | formail -x 'From ' | awk '{print substr($5,3,2)}'`
#month=`cat $tmp | formail -x 'From ' | awk '{print $4}'`

month=`echo $month | \
  awk 'BEGIN { \
  month["Jan"]="01";\
  month["Feb"]="02";\
  month["Mar"]="03";\
  month["Apr"]="04";\
  month["May"]="05";\
  month["Jun"]="06";\
  month["Jul"]="07";\
  month["Aug"]="08";\
  month["Sep"]="09";\
  month["Oct"]="10";\
  month["Nov"]="11";\
  month["Dec"]="12";\
  } \
  {print month[$1]}'`
mbox=$top/mbox.$year$month
echo $mbox
touch $mbox
cat $tmp >> $mbox
/bin/rm -f $tmp

-------------- next part --------------
#!/bin/sh
#
# Todd Pfaff
# pfaff at mcmaster.ca
# Feb 14 2000
#

list=$1
if [ "$list" = "" ]; then
  echo usage: $0 [-r] listname
  exit 1
fi

opt_r=""
if [ "$list" = "-r" ]; then
  list=$2
  if [ "$list" = "" ]; then
    echo usage: $0 [-r] listname
    exit 1
  fi
  opt_r="-r"
fi

wilma-list-setup $opt_r $list
wilma_reindex $list

-------------- next part --------------
#!/bin/sh
#
# prepare a list for wilma
# Todd Pfaff
# Feb 2000
# 

list=$1
if [ "$list" = "" ]; then
  echo usage: $0 [-r] listname
  exit 1
fi

opt_r=0
if [ "$list" = "-r" ]; then
  list=$2
  if [ "$list" = "" ]; then
    echo usage: $0 [-r] listname
    exit 1
  fi
  opt_r=1
fi

top=$HOME/archives/private/$list
mkdir $top >/dev/null 2>&1

arc_dir=$HOME/archives/private/$list
index_arc_dir=`cd $arc_dir; pwd`
arc_url=/mailman/private/$list
# set wilma list public if mailman archives/public/listname link exists
#[ -h $HOME/archives/public/$list ] && arc_url=/mailman/public/$list

if [ "$opt_r" -eq 1 ]; then
  # clean up existing archives
  # these will all be recreated by wilma_reindex
  echo removing old wilma index and archive files
  /bin/rm -rf $top/index $top/9* $top/0* $top/mbox.*
fi

cd $HOME/local/wilma

mkdir $top/index >/dev/null 2>&1

cp -p .glimpse_exclude $top/index/.glimpse_exclude

cat .glimpse_filters | \
sed -e "1,\$s,_HOME_,$HOME,g" \
> $top/index/.glimpse_filters

/bin/rm -f $top/index.html
cat redirect_index.html | \
sed -e "1,\$s,_LIST_,$list,g" \
> $top/index.html

list_real_name=`config_list -o - $list | grep '^real_name =' | awk '{print $3}' | sed "s/'//g"`

/bin/rm -f .wilma/$list.cf
cat wilma_template.cf | \
sed -e "1,\$s,_LIST_,$list,g" \
    -e "1,\$s,_LIST-REAL-NAME_,$list_real_name,g" \
    -e "1,\$s,_INDEX-ARC-DIR_,$index_arc_dir,g" \
    -e "1,\$s,_ARC-DIR_,$arc_dir,g" \
    -e "1,\$s,_ARC-URL_,$arc_url,g" \
    -e "1,\$s,_HOME_,$HOME,g" \
    -e "1,\$s,_HOST_,edge.cis.mcmaster.ca,g" \
> .wilma/$list.cf

/bin/rm -f .wilma/$list.rc
cat wilma_template.rc | \
sed -e "1,\$s,_LIST_,$list,g" \
    -e "1,\$s,_LIST-REAL-NAME_,$list_real_name,g" \
    -e "1,\$s,_INDEX-ARC-DIR_,$index_arc_dir,g" \
    -e "1,\$s,_ARC-DIR_,$arc_dir,g" \
    -e "1,\$s,_ARC-URL_,$arc_url,g" \
    -e "1,\$s,_HOME_,$HOME,g" \
    -e "1,\$s,_HOST_,edge.cis.mcmaster.ca,g" \
> .wilma/$list.rc

-------------- next part --------------
#!/bin/sh
#
# Todd Pfaff
# pfaff at mcmaster.ca
# Feb 14 2000
#

list=$1
if [ "$list" = "" ]; then
  echo usage: $0 [-r] listname
  exit 1
fi

opt_r=""
if [ "$list" = "-r" ]; then
  list=$2
  if [ "$list" = "" ]; then
    echo usage: $0 [-r] listname
    exit 1
  fi
  opt_r="-r"
fi

wilma-list-mboxify $opt_r $list
wilma-list-regen-html $opt_r $list

-------------- next part --------------
# Web Interface to List Mail Archives configuration file
#
# $Revision: 1.7 $

# [ wilma ]
# Convenience definition of the list name for expansion in other
# definitions. Wilma knows the list name by the name of this
# configuration file.

LIST = _LIST_

# [ wilma ]
# The title text for the list

TITLE = _LIST-REAL-NAME_

# [ wilma wilma_reindex wilma_hiliter ]
# Pathname of the HTML archive directory (not including the archive
# directory itself).

ARC_DIR = _ARC-DIR_

# [ wilma ]
# URL path of the HTML archive directories if different than the file
# system path above. Leave blank if the same.

ARC_URL = _ARC-URL_

# [ wilma_reindex ]
# Pathname of the raw mail list archive directory if different than the
# HTML archive. Leave blank if the same.

ARC_MBOX = _HOME_/archives/private/$LIST.mbox

# [ wilma wilma_reindex ]
# A Perl regular expression returning the year and month numbers
# in $1 and $2, respectively, from the HTML archive directory
# name. Backslashes must be doubled due to configuration variable
# interpolation.

ARC_REGEX  = ^(?:.*/)?((?:19|20)?\\d\\d)(\\d\\d)(\.gz)?$

# [ wilma_reindex ]
# A Perl regular expression returning the year and month numbers in
# $1 and $2, respectively, from the mailbox archive name if different
# than ARC_REGEX. Leave blank if the same. HTML archive directories
# and archive mailbox links are differientated by those file system
# attributes. Backslashes must be doubled due to configuration variable
# interpolation.

MBOX_REGEX = mbox.(\\d\\d)(\\d\\d)(\.gz)?$

# [ wilma wilma_glimpse wilma_hiliter ]
# The e-mail address of the person responsible for the HTML archive or
# mail list.

MAIL_ADDR = $LIST-owner

# [ wilma wilma_glimpse wilma_hiliter ]
# The full name of the person responsible for the HTML archive or mail
# list.

MAIL_NAME = $LIST-owner

# [ wilma ]
# URL of the glimpse help page

GLIMPSE_HELP = http://glimpse.cs.arizona.edu/glimpsehelp.html#sect7

# [ wilma ]
# URL of index page background image. Takes precedence over
# INDEX_BG_COLOR.

#INDEX_BACKGROUND =

# [ wilma ]
# Color name or mask for index page if INDEX_BACKGROUND not defined.

INDEX_BG_COLOR = white

# [ wilma_glimpse ]
# Path to glimpse, which searches the archive indices.

GLIMPSE = _HOME_/local/glimpse/bin/glimpse

# [ wilma_glimpse ]
# Pathname of the archive index directory.

INDEX_DIR = $ARC_DIR/index

# [ wilma_glimpse ]
# Pathname of the archive directory that glimpse sees when it does
# getcwd(3) if different than $ARC_DIR. It can be different due to NFS
# automounts and symbolic links.

INDEX_ARC_DIR = _INDEX-ARC-DIR_

# [ wilma_glimpse ]
# URL of search page background image. Takes precedence over
# SEARCH_BG_COLOR.

#SEARCH_BACKGROUND =

# [ wilma_glimpse ]
# Color name or mask for search page if SEARCH_BACKGROUND not defined.

SEARCH_BG_COLOR = white

# [ wilma_reindex ]
# Octal umask setting while generating HTML archives and indices.
# Defaults to 022 if not set.

UMASK = 002

# [ wilma_reindex ]
# Octal mode setting for index files. Defaults to 0644 if not set.

INDEX_MODE = 0664

# [ wilma_reindex ]
# Pathname of the file signaling that reindexing is in progress.

FLAG_FILE = $ARC_DIR/reindex.lock

# [ wilma_reindex ]
# Path to glimpseindex, which rebuilds the search index for the
# archives.

GLIMPSEINDEX = _HOME_/local/glimpse/bin/glimpseindex

# [ wilma_reindex ]
# Table allocation for glimpseindex (the -M option) in megabytes.
# Defaults to 50 Mb if not set (same as previous versions). If you don't
# have that much memory/swap space, you'll need to specify something
# less.

#GLIMPSE_TABLE = 50

# [ wilma_reindex ]
# Path to mhonarc, which reformats archived mail in HTML.

MHONARC = _HOME_/local/mhonarc/bin/mhonarc

# [ wilma_reindex ]
# Name of the MHonArc rc file.

MHONARC_RCFILE = $LIST.rc
-------------- next part --------------
<!-- MHonArc rcfile template for Wilma $Revision: 1.4 $ -->

<!-- GZIPFILES>
<GZIPEXE>
/usr/local/bin/gzip
</GZIPEXE>

<NODOC>

<SORT>

<MSGSEP>
^From .*  \w{3} \w{3} [ \d]\d
</MSGSEP>

<TLEVELS>
5
</TLEVELS>

<DEFINEVAR CHOP>
TOPFNAME
/mailman/wilma/_LIST_
</DEFINEVAR>

<MULTIPG>
<IDXSIZE>
500
</IDXSIZE>

<IDXFNAME>
index.html
</IDXFNAME>

<TITLE>
Chronological Index
</TITLE>

<IDXPGBEGIN>
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html>
<head>
<title>$IDXTITLE$ ($OUTDIR$)</title>
</head>
<body bgcolor="white">
<h1>$IDXTITLE$ ($OUTDIR$)</h1>
</IDXPGBEGIN>

<LISTBEGIN>
$PGLINK(PREV)$ $PGLINK(NEXT)$<br>
<a href="$TIDXFNAME$">[Thread Index]</a>
<a href="$TOPFNAME$">[Archive Top]</a>
<br>
<a href="http://_HOST_/mailman/listinfo/_LIST_/">_LIST-REAL-NAME_ Mailman info page</a>
<hr>
<address>
Last update: $LOCALDATE$<br>
$NUMOFMSG$ messages in chronological order<br>
</address>
<p>
<table>
<tr><th><strong>Subject</strong><hr>
    <th><em>From</em><hr>
    <th>Date<br><hr>
</LISTBEGIN>

<LITEMPLATE>
<tr>
<td>$SUBJECT$
<td><em>$FROMNAME:22$</em>
<td>$MSGLOCALDATE(CUR;%b %d %Y %H:%M)$<br>
</LITEMPLATE>

<LISTEND>
</table>
<p>
<hr>
$PGLINK(PREV)$ $PGLINK(NEXT)$<br>
<a href="$TIDXFNAME$">[Thread Index]</a>
<a href="$TOPFNAME$">[Archive Top]</a>
<br>
<a href="http://_HOST_/mailman/listinfo/_LIST_/">_LIST-REAL-NAME_ Mailman info page</a>
<p>
</LISTEND>

<TTITLE>
Thread Index
</TTITLE>

<TIDXPGBEGIN>
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html>
<head>
<title>$TIDXTITLE$ ($OUTDIR$)</title>
</head>
<body bgcolor="white">
<h1>$TIDXTITLE$ ($OUTDIR$)</h1>
</TIDXPGBEGIN>

<THEAD>
$PGLINK(TPREV)$ $PGLINK(TNEXT)$<br>
<a href="$IDXFNAME$">[Chronological Index]</a>
<a href="$TOPFNAME$">[Archive Top]</a>
<br>
<a href="http://_HOST_/mailman/listinfo/_LIST_/">_LIST-REAL-NAME_ Mailman info page</a>
<hr>
<address>
Last update: $LOCALDATE$<br>
$NUMOFMSG$ threaded messages<br>
</address>
<p>
</THEAD>

<TFOOT>
<p>
<hr>
$PGLINK(TPREV)$ $PGLINK(TNEXT)$<br>
<a href="$IDXFNAME$">[Chronological Index]</a>
<a href="$TOPFNAME$">[Archive Top]</a>
<br>
<a href="http://_HOST_/mailman/listinfo/_LIST_/">_LIST-REAL-NAME_ Mailman info page</a>
<p>
</TFOOT>

<MSGPGBEGIN>
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML//EN">
<html>
<head>
<title>$SUBJECTNA:72$</title>
<link rev="made" href="mailto:$FROMADDR$">
</head>
<body bgcolor="white">
</MSGPGBEGIN>

<TOPLINKS>
<hr>
$PREVBUTTON$$NEXTBUTTON$
<a href="$IDXFNAME$#$MSGNUM$">[Chronological]</a>
<a href="$TIDXFNAME$#$MSGNUM$">[Thread]</a>
<a href="$TOPFNAME$">[Archive Top]</a>
</TOPLINKS>

<BOTLINKS>
<ul>
$PREVLINK$
$NEXTLINK$
<li>Index(es):
<ul>
<li><a href="$IDXFNAME$#$MSGNUM$"><strong>Chronological</Strong></a></li>
<li><a href="$TIDXFNAME$#$MSGNUM$"><strong>Thread</strong></a></li>
</ul>
<br>
<a href="mailto:_LIST_ at _HOST_">Reply to: _LIST_ at _HOST_</a>
<br>
<a href="http://_HOST_/mailman/listinfo/_LIST_/">_LIST-REAL-NAME_ Mailman info page</a>
</BOTLINKS>


<EXCS override>
apparently
comments
content-length
content-transfer-encoding
content-type
errors-to 
followup
forward 
lines 
message-id
mime- 
nntp- 
originator 
path 
precedence 
priority
received 
replied 
reply-to
return-path 
sender
status 
via 
x- 
</EXCS>

<LABELSTYLES>
-default-
subject:strong
from:strong
to:strong
</LABELSTYLES>

<FIELDSTYLES>
-default-
subject:strong
from:strong
to:strong
keywords:em
newsgroups:strong
</FIELDSTYLES>

-------------- next part --------------
*.html	_HOME_/local/wilma/wilma_striphtml	<


More information about the Mailman-Users mailing list