[Spambayes-checkins] spambayes splitndirs.py,1.2,1.3
Guido van Rossum
gvanrossum@users.sourceforge.net
Fri, 20 Sep 2002 13:00:48 -0700
Update of /cvsroot/spambayes/spambayes
In directory usw-pr-cvs1:/tmp/cvs-serv13883
Modified Files:
splitndirs.py
Log Message:
Another refinement: in order to make nice training sets out of Bruce
G's spam collections, this script now supports multiple input mboxes.
Index: splitndirs.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/splitndirs.py,v
retrieving revision 1.2
retrieving revision 1.3
diff -C2 -d -r1.2 -r1.3
*** splitndirs.py 20 Sep 2002 19:30:52 -0000 1.2
--- splitndirs.py 20 Sep 2002 20:00:45 -0000 1.3
***************
*** 3,7 ****
"""Split an mbox into N random directories of files.
! Usage: %(program)s [-h] [-s seed] [-v] -n N sourcembox outdirbase
Options:
--- 3,7 ----
"""Split an mbox into N random directories of files.
! Usage: %(program)s [-h] [-s seed] [-v] -n N sourcembox ... outdirbase
Options:
***************
*** 84,90 ****
usage(1, "an -n value > 1 is required")
! if len(args) != 2:
usage(1, "input mbox name and output base path are required")
! inputpath, outputbasepath = args
outdirs = [outputbasepath + ("%d" % i) for i in range(1, n+1)]
--- 84,90 ----
usage(1, "an -n value > 1 is required")
! if len(args) < 2:
usage(1, "input mbox name and output base path are required")
! inputpaths, outputbasepath = args[:-1], args[-1]
outdirs = [outputbasepath + ("%d" % i) for i in range(1, n+1)]
***************
*** 93,110 ****
os.makedirs(dir)
- mbox = mboxutils.getmbox(inputpath)
counter = 0
! for msg in mbox:
! i = random.randrange(n)
! astext = str(msg)
! #assert astext.endswith('\n')
! counter += 1
! msgfile = open('%s/%d' % (outdirs[i], counter), 'wb')
! msgfile.write(astext)
! msgfile.close()
! if verbose:
! if counter % 100 == 0:
! sys.stdout.write('.')
! sys.stdout.flush()
if verbose:
--- 93,111 ----
os.makedirs(dir)
counter = 0
! for inputpath in inputpaths:
! mbox = mboxutils.getmbox(inputpath)
! for msg in mbox:
! i = random.randrange(n)
! astext = str(msg)
! #assert astext.endswith('\n')
! counter += 1
! msgfile = open('%s/%d' % (outdirs[i], counter), 'wb')
! msgfile.write(astext)
! msgfile.close()
! if verbose:
! if counter % 100 == 0:
! sys.stdout.write('.')
! sys.stdout.flush()
if verbose: