[Spambayes] training on cyrus style mailboxes

Nick Rout nick at rout.co.nz
Mon Aug 4 15:32:51 EDT 2003


well each file is rfc822, yes. each directory has a number of messages,
each with a unique incrementing name in the form

[digit][digit]....[digit].  eg 12345. (yes that full-stop is the last
character of the file name)

also in each dir there are the following files:

cyrus.cache
cyrus.header
cyrus.index

and a directory for each subfolder (and inbox is the root of all the
folders). 

So what parameters would I use to mboxtrain.py on a directory of
messages like that, without includding the cyrus.* and the
sub-directories?

sorry to be thick. heres a much shortened ls -Fl on my inbox

-rw-------    1 cyrus    mail         1444 May 19  2002 5825.
-rw-------    1 cyrus    mail         1267 May 19  2002 5826.
-rw-------    1 cyrus    mail          955 May 19  2002 5827.
-rw-------    1 cyrus    mail          778 May 19  2002 5828.
-rw-------    1 cyrus    mail         3779 May 19  2002 5829.
-rw-------    1 cyrus    mail         2152 May 19  2002 5830.
drwx------    2 cyrus    mail        24576 Jul  2 12:02 backup/
-rw-------    1 cyrus    mail      5055200 Aug  4 14:20 cyrus.cache
-rw-------    1 cyrus    mail          158 May  1 13:08 cyrus.header
-rw-------    1 cyrus    mail       308572 Aug  4 14:28 cyrus.index
drwx------    2 cyrus    mail         4096 Jun 30 15:45 Draft/
drwx------    2 cyrus    mail         4096 Jun 27 14:06 Drafts/
drwx------    2 cyrus    mail         4096 Aug  1 12:46 Keep/
drwx------   27 cyrus    mail         4096 Aug  4 13:36 Mailing_Lists/
drwx------    7 cyrus    mail         4096 Apr 16 14:31 Personal/
drwx------    2 cyrus    mail         4096 Mar 18 14:41 Project/
drwx------    2 cyrus    mail        77824 Aug  4 14:13 Sent/
drwx------    2 cyrus    mail        57344 Aug  4 11:15 SPAM/
drwx------    2 cyrus    mail       163840 Aug  4 14:28 Trash/
drwx------   40 cyrus    mail         4096 Dec  2  2002 Work/


On Mon, 4 Aug 2003 13:40:28 +1200
"Meyer, Tony" <T.A.Meyer at massey.ac.nz> wrote:

> > I have a cyrus imap store. i want to train spambayes on these 
> > mail boxes. They are (like maildir) one file per message, but 
> > alo differ to maildir in a number of ways.
> 
> How different?  Are they more or less RFC822?  If they are then the
> existing tools would probably work.  If they aren't, then you could
> create a function to convert them to RFC822 and pass the results of that
> to the tokenizer.  Once you have the tokens, it doesn't matter what the
> format is, you can just pass them to the appropriate storage object.
> 
> =Tony Meyer
> 





More information about the Spambayes mailing list