[Python-checkins] python/nondist/sandbox/spambayes mboxcount.py,1.4,1.5

tim_one@users.sourceforge.net tim_one@users.sourceforge.net
Mon, 26 Aug 2002 13:45:28 -0700


Update of /cvsroot/python/python/nondist/sandbox/spambayes
In directory usw-pr-cvs1:/tmp/cvs-serv13516

Modified Files:
	mboxcount.py 
Log Message:
Updated stats to what Barry and I both get now.  Fiddled output.


Index: mboxcount.py
===================================================================
RCS file: /cvsroot/python/python/nondist/sandbox/spambayes/mboxcount.py,v
retrieving revision 1.4
retrieving revision 1.5
diff -C2 -d -r1.4 -r1.5
*** mboxcount.py	26 Aug 2002 18:55:26 -0000	1.4
--- mboxcount.py	26 Aug 2002 20:45:25 -0000	1.5
***************
*** 14,36 ****
  
  """
! Stats for Barry's corpora, as of 22-Aug-2002:
! 
! \code\edu-sig-clean.mbox                               252
! \code\python-dev-clean.mbox                           8326
! \code\mailman-developers-clean.mbox                   2427
! \code\python-list-clean.mbox                         85500
! \code\zope3-clean.mbox                                2177
! 
! (zope3-clean is really zope3-dev-clean.mbox)
! 
! On Linux:
  
! edu-sig-clean.mbox                252 (unparseable: 0)
! python-dev-clean.mbox            8325 (unparseable: 1)
! mailman-developers-clean.mbox    2427 (unparseable: 0)
! python-list-clean.mbox         159052 (unparseable: 22)
! zope3-clean.mbox                 2177 (unparseable: 0)
  
! (unparseable messages are likely spam)
  """
  
--- 14,29 ----
  
  """
! Stats for Barry's corpora, as of 26-Aug-2002, using then-current 2.3a0:
  
! edu-sig-clean.mbox                 252 (+ unparseable: 0)
! python-dev-clean.mbox             8326 (+ unparseable: 0)
! mailman-developers-clean.mbox     2427 (+ unparseable: 0)
! python-list-clean.mbox          159072 (+ unparseable: 2)
! zope3-clean.mbox                  2177 (+ unparseable: 0)
  
! Unparseable messages are likely spam.
! zope3-clean.mbox is really from the zope3-dev mailing list.
! The Python version matters because the email package varies across releases
! in whether it uses strict or lax parsing.
  """
  
***************
*** 91,95 ****
          for fname in fnames:
              goodn, badn = count(fname)
!             print "%-50s %7d (unparseable: %d)" % (fname, goodn, badn)
  
  if __name__ == '__main__':
--- 84,88 ----
          for fname in fnames:
              goodn, badn = count(fname)
!             print "%-35s %7d (+ unparseable: %d)" % (fname, goodn, badn)
  
  if __name__ == '__main__':