[Spambayes-checkins] spambayes splitndirs.py,1.2,1.3

Guido van Rossum gvanrossum@users.sourceforge.net
Fri, 20 Sep 2002 13:00:48 -0700


Update of /cvsroot/spambayes/spambayes
In directory usw-pr-cvs1:/tmp/cvs-serv13883

Modified Files:
	splitndirs.py 
Log Message:
Another refinement: in order to make nice training sets out of Bruce
G's spam collections, this script now supports multiple input mboxes.


Index: splitndirs.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/splitndirs.py,v
retrieving revision 1.2
retrieving revision 1.3
diff -C2 -d -r1.2 -r1.3
*** splitndirs.py	20 Sep 2002 19:30:52 -0000	1.2
--- splitndirs.py	20 Sep 2002 20:00:45 -0000	1.3
***************
*** 3,7 ****
  """Split an mbox into N random directories of files.
  
! Usage: %(program)s [-h] [-s seed] [-v] -n N sourcembox outdirbase
  
  Options:
--- 3,7 ----
  """Split an mbox into N random directories of files.
  
! Usage: %(program)s [-h] [-s seed] [-v] -n N sourcembox ... outdirbase
  
  Options:
***************
*** 84,90 ****
          usage(1, "an -n value > 1 is required")
  
!     if len(args) != 2:
          usage(1, "input mbox name and output base path are required")
!     inputpath, outputbasepath = args
  
      outdirs = [outputbasepath + ("%d" % i) for i in range(1, n+1)]
--- 84,90 ----
          usage(1, "an -n value > 1 is required")
  
!     if len(args) < 2:
          usage(1, "input mbox name and output base path are required")
!     inputpaths, outputbasepath = args[:-1], args[-1]
  
      outdirs = [outputbasepath + ("%d" % i) for i in range(1, n+1)]
***************
*** 93,110 ****
              os.makedirs(dir)
  
-     mbox = mboxutils.getmbox(inputpath)
      counter = 0
!     for msg in mbox:
!         i = random.randrange(n)
!         astext = str(msg)
!         #assert astext.endswith('\n')
!         counter += 1
!         msgfile = open('%s/%d' % (outdirs[i], counter), 'wb')
!         msgfile.write(astext)
!         msgfile.close()
!         if verbose:
!             if counter % 100 == 0:
!                 sys.stdout.write('.')
!                 sys.stdout.flush()
  
      if verbose:
--- 93,111 ----
              os.makedirs(dir)
  
      counter = 0
!     for inputpath in inputpaths:
!         mbox = mboxutils.getmbox(inputpath)
!         for msg in mbox:
!             i = random.randrange(n)
!             astext = str(msg)
!             #assert astext.endswith('\n')
!             counter += 1
!             msgfile = open('%s/%d' % (outdirs[i], counter), 'wb')
!             msgfile.write(astext)
!             msgfile.close()
!             if verbose:
!                 if counter % 100 == 0:
!                     sys.stdout.write('.')
!                     sys.stdout.flush()
  
      if verbose: