[Spambayes] training on cyrus style mailboxes
Nick Rout
nick at rout.co.nz
Mon Aug 4 15:32:51 EDT 2003
well each file is rfc822, yes. each directory has a number of messages,
each with a unique incrementing name in the form
[digit][digit]....[digit]. eg 12345. (yes that full-stop is the last
character of the file name)
also in each dir there are the following files:
cyrus.cache
cyrus.header
cyrus.index
and a directory for each subfolder (and inbox is the root of all the
folders).
So what parameters would I use to mboxtrain.py on a directory of
messages like that, without includding the cyrus.* and the
sub-directories?
sorry to be thick. heres a much shortened ls -Fl on my inbox
-rw------- 1 cyrus mail 1444 May 19 2002 5825.
-rw------- 1 cyrus mail 1267 May 19 2002 5826.
-rw------- 1 cyrus mail 955 May 19 2002 5827.
-rw------- 1 cyrus mail 778 May 19 2002 5828.
-rw------- 1 cyrus mail 3779 May 19 2002 5829.
-rw------- 1 cyrus mail 2152 May 19 2002 5830.
drwx------ 2 cyrus mail 24576 Jul 2 12:02 backup/
-rw------- 1 cyrus mail 5055200 Aug 4 14:20 cyrus.cache
-rw------- 1 cyrus mail 158 May 1 13:08 cyrus.header
-rw------- 1 cyrus mail 308572 Aug 4 14:28 cyrus.index
drwx------ 2 cyrus mail 4096 Jun 30 15:45 Draft/
drwx------ 2 cyrus mail 4096 Jun 27 14:06 Drafts/
drwx------ 2 cyrus mail 4096 Aug 1 12:46 Keep/
drwx------ 27 cyrus mail 4096 Aug 4 13:36 Mailing_Lists/
drwx------ 7 cyrus mail 4096 Apr 16 14:31 Personal/
drwx------ 2 cyrus mail 4096 Mar 18 14:41 Project/
drwx------ 2 cyrus mail 77824 Aug 4 14:13 Sent/
drwx------ 2 cyrus mail 57344 Aug 4 11:15 SPAM/
drwx------ 2 cyrus mail 163840 Aug 4 14:28 Trash/
drwx------ 40 cyrus mail 4096 Dec 2 2002 Work/
On Mon, 4 Aug 2003 13:40:28 +1200
"Meyer, Tony" <T.A.Meyer at massey.ac.nz> wrote:
> > I have a cyrus imap store. i want to train spambayes on these
> > mail boxes. They are (like maildir) one file per message, but
> > alo differ to maildir in a number of ways.
>
> How different? Are they more or less RFC822? If they are then the
> existing tools would probably work. If they aren't, then you could
> create a function to convert them to RFC822 and pass the results of that
> to the tokenizer. Once you have the tokens, it doesn't matter what the
> format is, you can just pass them to the appropriate storage object.
>
> =Tony Meyer
>
More information about the Spambayes
mailing list