[Python-checkins] Insanity! (Re: Python-checkins digest, Vol 1 #346 - 1 msg)

Barry A. Warsaw bwarsaw@cnri.reston.va.us (Barry A. Warsaw)
Fri, 4 Feb 2000 14:21:13 -0500 (EST)


>>>>> "PF" == Peter Funk <pf@artcom-gmbh.de> writes:

    PF> Well, I like browsing through the diffs (its fun to see, how
    PF> it grows) and I don't care much about this bandwith issue as
    PF> long as they are kept below 5 MB or so.  However I would
    PF> prefer unified context diffs (created by diff -u).  They would
    PF> take up only half the space and are (which is more important)
    PF> much much easier to read.

The concensus when we voted last time was for context 2 diffs.
Personally, I find unifieds harder to read (and I think Guido
agreed).

    PF> I'm not very familar with CVS, so I would like to know which
    PF> tool/script produces these mails?  I've asked one of my
    PF> employees, who is more familar with CVS, but he also didn't
    PF> know.

It's a loginfo script that I've hacked together over the years.  It's
not pretty because it deals with lots of CVS, ssh, rsync, and sendmail
peculiarities.  I include it below for your enjoyment.  It would be
trivial to change to unifieds.

Invoked by a CVSROOT/loginfo entry like so:

python/dist	/projects/cvsroot/CVSROOT/syncmail --norecurse %{sVv} python-checkins@python.org

Enjoy,
-Barry

-------------------- snip snip --------------------
#! /depot/sundry/plat/bin/python

# To do:
#  -- why does this seem to take so long for some people?

"""Complicated syncing and notification for CVS checkins.

This script is intended to be run from the CVS loginfo file.  It handles
optional email notifications, fake reply addressing, and remote syncing via
rsync/ssh.  All these things are useful when you an Open Source software
development process.

Usage:

syncmail [--exclude=EX ...] [--norecurse] [--fg] [--nosync] [--munge] [-h]
         [--cvsroot=<path>] <%%S> [email-addrs ...]

Where:
    --exclude=EX
        any number of rsync --exclude= options.  These are passed directly to
        the rsync process

    --norecurse
        Do not do a recursive sync.  Sync only in the specified directory.

    --fg
        Run the rsync command in the foreground

    --nosync
        Do not run rsync at all

    --munge
        Munge return addresses.  Ordinarily, email is send with the
        /usr/ucb/Mail command and the return address is provided by the
        system.  However, when remote maintainers check in changes, their
        addresses will incorrectly seem like they come from the machine or
        domain running the CVS repository.  When this option is given, the
        user making the check in is looked up in the /etc/mail/aliases file.
        If a match is found, their alias expansion is forced as the return
        address and /bin/mailx is used as the MUA.

    --cvsroot=<path>
    	Use <path> as the environment variable CVSROOT.  Otherwise this
    	variable must exist in the environment.

    --help
    -h
        Print this text.

    <%%S>
        CVS %%s loginfo expansion.  This will be a single string containing
        the directory the checkin is being made in, relative to $CVSROOT,
        followed by the list of files that are changing

    email-addrs
        any number of email addrs if mail is to be sent.  if omitted, no mail
        is sent.
""" #'

# The shell transport used to run rsync over
SSH_CMD = '/depot/sundry/plat/bin/ssh -i <path-to-key> -l rsync'

# Exclude CVS lock files.  This is required because when the loginfo command
# is executed, CVS is still retaining locks on directories.  If the locks are
# rsync'd over, then the remote repository mirror will be hosed.
EXCLUDES = ['--exclude "#cvs.*"']
NORECURSE_EXCLUDE = '--exclude "/*/*"'

# The location of $CVSROOT on the remote machine
REM_CVSROOT = 'sweetpea:/projects/cvsroot'

# Template for the rsync command
RSYNC_OPTS = '-av --delete'
RSYNC_CMD = '/depot/sundry/plat/bin/rsync %(RSYNC_OPTS)s %(LOC_SYNCDIR)s %(REM_SYNCDIR)s %(XCLUDES)s'

# Notification command
MAIL_CMD = '%(mailcmd)s -s "CVS: %(SUBJECT)s" %(PEOPLE)s'
MAILPROG = '/usr/ucb/Mail'
MUNGE_MAILPROG = '/bin/mailx'
MUNGE_IGNORE = ['bwarsaw']
MUNGE_ALIASFILES = ['/etc/mail/aliases']

DIFF_HEAD_LINES = 20
DIFF_TAIL_LINES = 20
DIFF_TRUNCATE_IF_LARGER = 1000

import os
import sys
import string
import time



def find_aliases(files):
    aliases = {}
    for file in files:
	try:
	    fp = open(file)
	    lines = fp.readlines()
	    fp.close()
	except IOError:
	    continue
        # read the file
	for line in lines:
	    line = string.strip(string.lower(line))
	    # skip blank lines and comments
	    if line == '' or line[0] == '#':
		continue
	    # TBD: doesn't handle split lines
	    i = string.find(line, ":")
	    if i < 0:
		# skip this line
		continue
	    alias = line[:i]
	    expansion = string.strip(line[i+1:])
	    # some expansions are worthless.  drop everything that is a
	    # program, file, or distribution list
	    if string.count(expansion, "/") or \
	       string.count(expansion, "|") or \
	       string.count(expansion, ","):
		continue
	    # overwrites any previous entry
	    aliases[alias] = expansion
    return aliases



def calculate_diff(filespec):
    try:
        file, oldrev, newrev = string.split(filespec, ',')
    except ValueError:
        # No diff to report
        return ''
    # This /has/ to happen in the background, otherwise we'll run into CVS
    # lock contention.  What a crock.
    diffcmd = 'cvs -f diff -C 2 -r %s -r %s %s' % (oldrev, newrev, file)
    fp = os.popen(diffcmd)
    lines = fp.readlines()
    sts = fp.close()
    # ignore the error code, it always seems to be 1 :(
##    if sts:
##        return 'Error code %d occurred during diff\n' % (sts >> 8)
    if len(lines) > DIFF_TRUNCATE_IF_LARGER:
        removedlines = len(lines) - DIFF_HEAD_LINES - DIFF_TAIL_LINES
        del lines[DIFF_HEAD_LINES:-DIFF_TAIL_LINES]
        lines.insert(DIFF_HEAD_LINES,
                     '[...%d lines suppressed...]\n' % removedlines)
    return string.join(lines, '')



def blast_mail(mailcmd, filestodiff):
    # cannot wait for child process or that will cause parent to retain cvs
    # lock for too long.  Urg!
    if not os.fork():
        # in the child
        # give up the lock you cvs thang!
        time.sleep(2)
##        fp = open('/tmp/debug', 'a')
        fp = os.popen(cmd, 'w')
        fp.write(sys.stdin.read())
        fp.write('\n')
        # append the diffs if available
        for file in specs[1:]:
            fp.write(calculate_diff(file))
            fp.write('\n')
        fp.close()
        # doesn't matter what code we return, it isn't waited on
        os._exit(0)



os.environ['RSYNC_RSH'] = SSH_CMD

# scan args for options
recurse = 1
fg = 0
munge = 0
sync = 1
args = sys.argv[1:]
while args:
    arg = args[0]
    if arg[:len('--exclude=')] == '--exclude=':
        i = string.find(arg, '=')
        EXCLUDES = EXCLUDES + ['--exclude', '"%s"' % arg[i+1:]]
        del args[0]
    elif arg == '--norecurse':
        recurse = 0
        del args[0]
    elif arg == '--fg':
        fg = 1
        del args[0]
    elif arg == '--munge':
        munge = 1
        del args[0]
    elif arg == '--nosync':
        sync = 0
        del args[0]
    elif arg[:10] == '--cvsroot=':
        os.environ['CVSROOT'] = arg[10:]
        del args[0]
    elif arg in ('-h', '--help'):
        print __doc__
        sys.exit(0)
    else:
        break

# What follows is the specification containing the files that were modified.
# The argument actually must be split, with the first component containing the 
# directory the checkin is being made in, relative to $CVSROOT, followed by
# the list of files that are changing.
SUBJECT = args[0]
specs = string.split(args[0])
del args[0]

# Docs for CVS state that $CVSROOT will be the root in use
if not recurse:
    # The only way to really do non-recursion correctly, in the face of
    # possible deletions, is to specify the source directory completely
    # including the final slash, specify the target directory completely
    # including the final slash, and explicitly exclude /*/* which tells rsync 
    # to exclude the contents of all subdirectories.  The actual
    # subdirectories themselves will be included though (but that shouldn't be 
    # a problem).  This is necessary because if the source isn't a directory,
    # --delete will no-op.  This is highly oblique; thanks to KLM for the
    # solution!
    EXCLUDES.append(NORECURSE_EXCLUDE)

REPO_DIR = specs[0]
LOC_SYNCDIR = os.path.join(os.environ['CVSROOT'], REPO_DIR)
# make sure there's a trailing slash!!!
if LOC_SYNCDIR[-1] <> '/':
    LOC_SYNCDIR = LOC_SYNCDIR + '/'
REM_SYNCDIR = os.path.join(REM_CVSROOT, REPO_DIR)
# same
if REM_SYNCDIR[-1] <> '/':
    REM_SYNCDIR = REM_SYNCDIR + '/'

XCLUDES = string.join(EXCLUDES)

if sync:
    cmd = RSYNC_CMD % vars()
    if not fg:
        cmd = cmd + ' 2>&1 > /dev/null &'
    print '*****************************************\n', cmd, \
          '\n*****************************************'
    sts = os.system(cmd)

# if no email addresses, do not send mail
if args:
    # possibly munge the addresses
    PEOPLE = string.join(args)
    if munge:
        user = os.environ['USER'] or os.environ['LOGNAME']
        # this is necessary because some addresses just don't work with mailx
        # -r. e.g. -r bwarsaw@cnri.reston.va.us just causes mailx to send the
        # mail to the bit bucket.  I have no idea why.
        retaddr = None
	if user not in MUNGE_IGNORE:
	    aliases = find_aliases(MUNGE_ALIASFILES)
	    retaddr = aliases.get(user)
        # protect subject from /bin/sh
        mailcmd = MUNGE_MAILPROG
        if retaddr:
            mailcmd = mailcmd + ' -r ' + retaddr
    else:
        mailcmd = MAILPROG
    cmd = MAIL_CMD % vars()
    if not fg:
        cmd = cmd + ' 2>&1 > /dev/null'
    # Now do the mail command
    print 'Mailed', PEOPLE
    blast_mail(mailcmd, specs[1:])
    sts = 0

sys.exit(sts)