From tameyer at ihug.co.nz Fri Oct 1 02:08:52 2004 From: tameyer at ihug.co.nz (Tony Meyer) Date: Fri Oct 1 02:08:57 2004 Subject: [spambayes-dev] 1.0 Build Testing (please!) In-Reply-To: Message-ID: [classification@spambayes.invalid] [Tony] > I think this would be a reasonable method of notating the > To: header. [Kenny] > +1 here. Done. =Tony Meyer From tameyer at ihug.co.nz Fri Oct 1 02:40:54 2004 From: tameyer at ihug.co.nz (Tony Meyer) Date: Fri Oct 1 02:41:00 2004 Subject: [spambayes-dev] button ideas (oh boy) In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F13065577B2@its-xchg4.massey.ac.nz> Message-ID: [From a message way back in May] [Kenny] > We already have buttons to perform both training functions > ("this is spam" and "this is good"), but the current labels could > be misleading if we repurpose them. Even now the labels can be > confusing to some (e.g. "Delete as Spam" doesn't really delete > anything). I suggest renaming the buttons to more generic "Spam" > and "Not Spam" (which also reduces the amount of space used by > the toolbar). Since this, there have been more instances of people getting confused by the word "Delete" in "Delete as Spam". I'm very much +1 on changing to "Spam" and "Not Spam" to (a) reduce confusion, (b) gain space, and (c) allow them to appear in all folders. IIRC, this wouldn't take very much work - a couple of small code changes and the various docs. I think the name change is worthwhile for (a) and (b) even if we don't go with (c). This would be only on CVS head (i.e. 1.1), obviously, not for 1.0.x. Does anyone have any objections? Otherwise I'll just go ahead and make these changes (either Sunday or Monday). > Keeping in mind that we can only train a given message once, > we'll probably still need to dynamically determine which buttons > are available for each message. Disabling inappropriate buttons > might be better than removing them so that the toolbar isn't > bouncing around as the user moves to different messages in the > folder. [...] > If I get a chance, I'll have a look at what it would take to > check the state each time the user changes the message selection. > I've considered trying something like this several times, but > just haven't gotten around to it. I'm still +0 on this (-0 if it's removing the buttons rather than disabling). However, if it really doesn't take much processing power and a suitable rule for selection of multiple messages can be found, I have no objections. The problem, I guess, is that there is no feedback when clicking "Spam" in the spam folder (etc), since the message doesn't move (although it would be rescored, I think, so if you have the spam column, you'd might see a change). So maybe I will look into disabling, but Kenny can still beat me to it... . =Tony Meyer From ta-meyer at ihug.co.nz Fri Oct 1 02:51:38 2004 From: ta-meyer at ihug.co.nz (Tony Meyer) Date: Fri Oct 1 02:51:42 2004 Subject: [spambayes-dev] Adding the ability to set an option for ham with the Outlook plug-in Message-ID: Since we're (well, me so far ) discussing changes to the Outlook plug-in, what do people think about this one: [ 1036970 ] Allow Outlook plugin to move ham to a designated folder It's pretty self-explanatory really. The steps would basically just be adding another option to the Filtering tab and another couple of lines of code. I think it's a good idea in general, and doubt that it would confuse many people, since I get the feeling that most of them use the Wizard results and never actually look in the SpamBayes Manager. The main problem I see is that there's no room left on the Filtering tab for this option (and resizing one tab means resizing them all). =Tony Meyer From sethg at GoodmanAssociates.com Fri Oct 1 07:05:06 2004 From: sethg at GoodmanAssociates.com (Seth Goodman) Date: Fri Oct 1 07:05:02 2004 Subject: [spambayes-dev] Adding the ability to set an option for ham withthe Outlook plug-in In-Reply-To: Message-ID: > From: Tony Meyer > Sent: Thursday, September 30, 2004 7:52 PM > > > Since we're (well, me so far ) discussing changes to the Outlook > plug-in, what do people think about this one: > > [ 1036970 ] Allow Outlook plugin to move ham to a designated folder > 61702&atid=498103> The first scenario in the tracker makes a lot of sense to me, where exchange does suboptimal filtering and Spambayes could recover ham from the spam folder automatically. I don't quite understand how it would help the second scenario, where it would replace background mode. You still wouldn't know whether Outlook rules or Spambayes got to a message first, so the results could be erratic. I have a lot of list mail that never contains spam that I don't want Spambayes to see. If it saw the list mail some of the time, I would have to train it on that mail, which would make the database unnecessarily bigger and perhaps affect the classification of my other mail. While we're talking about Outlook features and possibly avoiding background mode, I have been using an add-in from Tech-Hit that accomplishes two things that might be useful if Spambayes could do them. The add-in is called AutoRead. The main thing it does is control the Outlook tray icon. The second potentially useful thing is that it is invoked as a custom action in an Outlook rule. When you write a rule that includes this custom action, you can optionally mark a message as read and optionally turn off the envelope icon only if it was previously off before the message. I use it in my list mail rules to keep that mail from turning on the envelope icon and it seems to work well. Whatever approach they use might be useful for invoking a resident Spambayes process from an Outlook rule (if that is even remotely feasible) and controlling the envelope icon depending on the classification. This would let you determine the order of actions in Outlook without background mode. While that works well enough, it is fairly slow. The first mail fetch in the morning can take a while to process at one second per message. BTW, I can understand the need for the first delay timer to wait for the incoming mail to stop and the Outlook rules to complete, but what is the reason for the second (message-to-message) timer. It obviously needs to be there or you wouldn't have added it, I was just curious why it is needed. That's the time delay that is somewhat annoying. > > It's pretty self-explanatory really. The steps would basically just be > adding another option to the Filtering tab and another couple of lines of > code. > > I think it's a good idea in general, and doubt that it would confuse many > people, since I get the feeling that most of them use the Wizard > results and never actually look in the SpamBayes Manager. Agreed. Overall a good idea. Regards, Seth Goodman Goodman Associates, LLC tel 608.833.9933 fax 608.833.9966 From sethg at GoodmanAssociates.com Fri Oct 1 07:17:55 2004 From: sethg at GoodmanAssociates.com (Seth Goodman) Date: Fri Oct 1 07:17:53 2004 Subject: [spambayes-dev] button ideas (oh boy) In-Reply-To: Message-ID: > From: Tony Meyer > Sent: Thursday, September 30, 2004 7:41 PM <...> > Since this, there have been more instances of people getting > confused by the word "Delete" in "Delete as Spam". I'm very much +1 > on changing to "Spam" and "Not Spam" to (a) reduce confusion, > (b) gain space, and (c) allow them to appear in all folders. > > IIRC, this wouldn't take very much work - a couple of small code > changes and the various docs. I think the name change is worthwhile > for (a) and (b) even if we don't go with (c). This would be only on > CVS head (i.e. 1.1), obviously, not for 1.0.x. > > Does anyone have any objections? Otherwise I'll just go ahead and make > these changes (either Sunday or Monday). Great feature. Please try to do c). > > > Keeping in mind that we can only train a given message once, > > we'll probably still need to dynamically determine which buttons > > are available for each message. Disabling inappropriate buttons > > might be better than removing them so that the toolbar isn't > > bouncing around as the user moves to different messages in the > > folder. > [...] > > If I get a chance, I'll have a look at what it would take to > > check the state each time the user changes the message selection. > > I've considered trying something like this several times, but > > just haven't gotten around to it. > > I'm still +0 on this (-0 if it's removing the buttons rather than > disabling). However, if it really doesn't take much processing > power and a suitable rule for selection of multiple messages can > be found, I have no objections. If it's a lot of trouble, and it probably is, don't even bother disabling the buttons. Leave that for some future release when there's nothing left to do (that'll be real soon now). Just take the single or multiple selection and do the action on any messages that it applies to. It would still be a big improvement to have both buttons in all folders. If you select a message in the spam folder that is already trained as spam and hit the spam button, nothing will happen, but so what? At least you can train on spam messages that didn't score very spammy without first moving them to the unsure folder. Likewise for ham messages that didn't score very hammy. > > The problem, I guess, is that there is no feedback when clicking "Spam" in > the spam folder (etc), since the message doesn't move (although > it would be rescored, I think, so if you have the spam column, you'd might > see a change). So maybe I will look into disabling, but Kenny can still > beat me to it... . I think the change in score would be a good enough indicator that something happened. If you guys eventually adopt TTE, perhaps training multiple times on a message would be something you want to do. -- Seth Goodman From tameyer at ihug.co.nz Fri Oct 1 07:26:14 2004 From: tameyer at ihug.co.nz (Tony Meyer) Date: Fri Oct 1 07:26:21 2004 Subject: [spambayes-dev] Adding the ability to set an option for hamwiththe Outlook plug-in In-Reply-To: Message-ID: > I don't quite understand how it would help the second > scenario, where it would replace background mode. You still > wouldn't know whether Outlook rules or Spambayes got to a > message first, so the results could be erratic. That bit in the tracker isn't well explained. The idea isn't really to replace background mode, but to provide the ability to run rules after SpamBayes. The actual order would be Inbox-Rules, SpamBayes, Hambox-Rules. So if you wanted your rules to run after SpamBayes, you just move all the rules to operate on the Hambox, if you want your rules to run before SpamBayes, you use background filtering (and you can mix if you like). The most common case that I have seen where this is requested or would solve the given problem is where there is some sort of access to the inbox from another place (sync'ing with another device, for example) and they have a rule to do this, but would like the rule to take place after the spam is removed. > While we're talking about Outlook features and possibly > avoiding background mode, I have been using an add-in from Tech-Hit > that accomplishes two things that might be useful if Spambayes > could do them. The add-in is called AutoRead. The main thing it > does is control the Outlook tray icon. The second potentially > useful thing is that it is invoked as a custom action in an Outlook > rule. When you write a rule that includes this custom action, > you can optionally mark a message as read and optionally turn off the > envelope icon only if it was previously off before the message. Addressing these separately: I don't really know anything about custom actions. I would have thought that if SpamBayes could be done via custom actions, then that's what Mark would have originally done, since it would make many things easier. I presume that there is something stopping us doing that (but, as I said, I know basically nothing about them). The envelope icon is probably the most requested change for the plug-in, I would think. It would be great if we could implement *some* solution for 1.1. We have code to turn the icon off - the reason that we don't is that there could be other mail arriving that we don't know about. (If we were part of the rules system, then things could be different). My thoughts are that it's so often requested that it would make things easier for us if we just provided it (perhaps as an experimental option to avoid cluttering the Manager dialog) so people could choose to turn it off for unsure/spam mail. We would point out that it might mean that it was turned off some times when it shouldn't have been. I do like the suggested ideas of providing our own icon/sound (e.g. displayed when mail is classified as ham), but that does seem a lot of work. > BTW, I can understand the need for the first delay timer to > wait for the incoming mail to stop and the Outlook rules to > complete, but what is the reason for the second (message-to-message) > timer. It obviously needs to be there or you wouldn't have > added it, I was just curious why it is needed. > That's the time delay that is somewhat annoying. I have no idea. As you say, I'm sure there was a reason. Maybe Mark will chime up and answer :) =Tony Meyer From kenny.pitt at gmail.com Fri Oct 1 16:24:25 2004 From: kenny.pitt at gmail.com (Kenny Pitt) Date: Fri Oct 1 16:24:38 2004 Subject: [spambayes-dev] button ideas (oh boy) In-Reply-To: References: <1ED4ECF91CDED24C8D012BCF2B034F13065577B2@its-xchg4.massey.ac.nz> Message-ID: <2a052b9904100107243fdfe54e@mail.gmail.com> Tony Meyer wrote: > [From a message way back in May] > > [Kenny] > > I suggest renaming the buttons to more generic "Spam" > > and "Not Spam" (which also reduces the amount of space used by > > the toolbar). > > Since this, there have been more instances of people getting confused by the > word "Delete" in "Delete as Spam". I'm very much +1 on changing to "Spam" > and "Not Spam" to (a) reduce confusion, (b) gain space, and (c) allow them > to appear in all folders. Since I initiated this, I already had most of it on hand. I updated the text in the docs to go along with the code changes, and I'll check this in shortly. This is just to change the names on the buttons. I'll leave displaying both buttons in all folders for a separate step because there are more issues there. I also didn't update screenshots pending possible icon changes (see below). > So maybe I will look into disabling, but Kenny can still beat me > to it... . I'm not dealing yet with enabling based on individual message status, but changing to disabling the buttons rather than hiding them is extremely trivial. I gave it a try and found one small problem with it. We don't have any transparency in our icon images because of the crazy way Outlook requires the transparency mask to be created, so the disabled icon shows up as just a solid gray square. Not very attractive. For a disabled icon, Outlook seems to gray out everything except transparent or *white* pixels. I'm playing with some replacement icons that use white symbols in the foreground (an X for Spam and a checkmark for Not Spam). These also have backgrounds that fill the entire icon square, thus eliminating the problems with the non-transparent backgrounds around the current icons. To give credit where it is due, I stole the background idea from InBoxer. I figured Sean wouldn't be too upset given where a large chunk of his code came from. Anyway, I've attempted to attach my "rough draft" bitmaps for review and comment. Hopefully they'll make it to the list intact. The checkmark icon needs some cleaning up, but the X icon gives a pretty good idea of what these would look like. -- Kenny Pitt -------------- next part -------------- A non-text attachment was scrubbed... Name: delete_as_spam.bmp Type: image/bmp Size: 822 bytes Desc: not available Url : http://mail.python.org/pipermail/spambayes-dev/attachments/20041001/459113d6/delete_as_spam.bmp -------------- next part -------------- A non-text attachment was scrubbed... Name: recover_ham.bmp Type: image/bmp Size: 822 bytes Desc: not available Url : http://mail.python.org/pipermail/spambayes-dev/attachments/20041001/459113d6/recover_ham.bmp From brown at dui-dwi.com Sat Oct 2 06:13:44 2004 From: brown at dui-dwi.com (DUI-DWI) Date: Sat Oct 2 06:14:11 2004 Subject: [spambayes-dev] button ideas (oh boy) In-Reply-To: <2a052b9904100107243fdfe54e@mail.gmail.com> Message-ID: <20041002041408.DXNR1786.imf19aec.mail.bellsouth.net@seeker> Guys, I'm a SpamBayes addict, and I kicked out a screen shot of what Kenny is envisioning. I can update it as he further tweaks the graphics. http://www.headlinesmarketing.com/temp/New_SpamBayes.jpg Let me know if you want it to look any different. Erik Brown - Webmaster -----Original Message----- From: spambayes-dev-bounces@python.org [mailto:spambayes-dev-bounces@python.org] On Behalf Of Kenny Pitt Sent: Friday, October 01, 2004 10:24 AM To: Tony Meyer Cc: SpamBayes-dev Forum Subject: Re: [spambayes-dev] button ideas (oh boy) Tony Meyer wrote: > [From a message way back in May] > > [Kenny] > > I suggest renaming the buttons to more generic "Spam" > > and "Not Spam" (which also reduces the amount of space used by > > the toolbar). > > Since this, there have been more instances of people getting confused by the > word "Delete" in "Delete as Spam". I'm very much +1 on changing to "Spam" > and "Not Spam" to (a) reduce confusion, (b) gain space, and (c) allow them > to appear in all folders. Since I initiated this, I already had most of it on hand. I updated the text in the docs to go along with the code changes, and I'll check this in shortly. This is just to change the names on the buttons. I'll leave displaying both buttons in all folders for a separate step because there are more issues there. I also didn't update screenshots pending possible icon changes (see below). > So maybe I will look into disabling, but Kenny can still beat me > to it... . I'm not dealing yet with enabling based on individual message status, but changing to disabling the buttons rather than hiding them is extremely trivial. I gave it a try and found one small problem with it. We don't have any transparency in our icon images because of the crazy way Outlook requires the transparency mask to be created, so the disabled icon shows up as just a solid gray square. Not very attractive. For a disabled icon, Outlook seems to gray out everything except transparent or *white* pixels. I'm playing with some replacement icons that use white symbols in the foreground (an X for Spam and a checkmark for Not Spam). These also have backgrounds that fill the entire icon square, thus eliminating the problems with the non-transparent backgrounds around the current icons. To give credit where it is due, I stole the background idea from InBoxer. I figured Sean wouldn't be too upset given where a large chunk of his code came from. Anyway, I've attempted to attach my "rough draft" bitmaps for review and comment. Hopefully they'll make it to the list intact. The checkmark icon needs some cleaning up, but the X icon gives a pretty good idea of what these would look like. -- Kenny Pitt From brown at dui-dwi.com Sat Oct 2 15:38:38 2004 From: brown at dui-dwi.com (DUI-DWI) Date: Sat Oct 2 15:39:09 2004 Subject: [spambayes-dev] button ideas (oh boy) In-Reply-To: <20041002041408.DXNR1786.imf19aec.mail.bellsouth.net@seeker> Message-ID: <20041002133906.HAWL1756.imf17aec.mail.bellsouth.net@seeker> Dev team, I've been playing around with the Spambayes menu graphic for the up-and-coming version. I present to you my first comp, I plan on making several more. If you look closely, I watermarked some Bayesian modelling in the background. http://www.headlinesmarketing.com/temp/comp01.jpg Let me know what you think. Webmaster - Erik Brown -----Original Message----- From: spambayes-dev-bounces@python.org [mailto:spambayes-dev-bounces@python.org] On Behalf Of DUI-DWI Sent: Saturday, October 02, 2004 12:14 AM To: 'Kenny Pitt'; 'Tony Meyer' Cc: 'SpamBayes-dev Forum' Subject: RE: [spambayes-dev] button ideas (oh boy) Guys, I'm a SpamBayes addict, and I kicked out a screen shot of what Kenny is envisioning. I can update it as he further tweaks the graphics. http://www.headlinesmarketing.com/temp/New_SpamBayes.jpg Let me know if you want it to look any different. Erik Brown - Webmaster -----Original Message----- From: spambayes-dev-bounces@python.org [mailto:spambayes-dev-bounces@python.org] On Behalf Of Kenny Pitt Sent: Friday, October 01, 2004 10:24 AM To: Tony Meyer Cc: SpamBayes-dev Forum Subject: Re: [spambayes-dev] button ideas (oh boy) Tony Meyer wrote: > [From a message way back in May] > > [Kenny] > > I suggest renaming the buttons to more generic "Spam" > > and "Not Spam" (which also reduces the amount of space used by > > the toolbar). > > Since this, there have been more instances of people getting confused by the > word "Delete" in "Delete as Spam". I'm very much +1 on changing to "Spam" > and "Not Spam" to (a) reduce confusion, (b) gain space, and (c) allow them > to appear in all folders. Since I initiated this, I already had most of it on hand. I updated the text in the docs to go along with the code changes, and I'll check this in shortly. This is just to change the names on the buttons. I'll leave displaying both buttons in all folders for a separate step because there are more issues there. I also didn't update screenshots pending possible icon changes (see below). > So maybe I will look into disabling, but Kenny can still beat me > to it... . I'm not dealing yet with enabling based on individual message status, but changing to disabling the buttons rather than hiding them is extremely trivial. I gave it a try and found one small problem with it. We don't have any transparency in our icon images because of the crazy way Outlook requires the transparency mask to be created, so the disabled icon shows up as just a solid gray square. Not very attractive. For a disabled icon, Outlook seems to gray out everything except transparent or *white* pixels. I'm playing with some replacement icons that use white symbols in the foreground (an X for Spam and a checkmark for Not Spam). These also have backgrounds that fill the entire icon square, thus eliminating the problems with the non-transparent backgrounds around the current icons. To give credit where it is due, I stole the background idea from InBoxer. I figured Sean wouldn't be too upset given where a large chunk of his code came from. Anyway, I've attempted to attach my "rough draft" bitmaps for review and comment. Hopefully they'll make it to the list intact. The checkmark icon needs some cleaning up, but the X icon gives a pretty good idea of what these would look like. -- Kenny Pitt _______________________________________________ spambayes-dev mailing list spambayes-dev@python.org http://mail.python.org/mailman/listinfo/spambayes-dev From hernan at orgmf.com.ar Mon Oct 4 17:42:26 2004 From: hernan at orgmf.com.ar (=?iso-8859-1?Q?Hern=E1n_Mart=EDnez_Foffani?=) Date: Mon Oct 4 17:43:23 2004 Subject: [spambayes-dev] I18N Message-ID: Hi! My name is Hern?n Foffani and would like to collaborate translating SpamBayes to Spanish while providing a general framework for other different languages as well. I also offer myself to search for foreign language volunteers and coordinate their work. Tony has advised me to start building a todo list. More than a list what I've written is an outline of different ideas. Here is what I've got up to now after many hours of thorough research and painstakingly accurate work (;-). a) Define the translation units. I would like to focus on small but complete units of the application. 1. Outlook addin 2. Windows tools 3. Win32 bundle docs 4. Win32 installer 5. Other sp_* tools 6. Website The order of the list reflects a personal preference (I'm an Outlook user) not a technical reason. b) Decide on the technical means (gettext or whatever) Ideally I'd rather use a dictionary based for the string literals where the message database is outside Spambayes' source code. Like gettext. It doesn't burden developers too much and it doesn't break the application if some language translation falls behind (it will happen.) I understand that there's a problem with outlook's dialogs which are RC based. I admit that I never used rcparse.py before. Do you think that integrating rcparse.py with gettext (or similar) can work? Is it possible to have a mechanism that would allow the work of translators without having them to build spambayes from source? I didn't mention localization issues. Are there any? Don't think so, right? Anyway, I would follow any tech decision the developers set in this area. c) Delivery machinery. We can have external language packs, a multilingual distribution or complete installers per language. Lang packs are easier to deal with because they can be managed independently of the mainstream developement but it would imply that the main installer will be in english. We can start distributing external language pack and later integrate them in the application installer. Given that Outlook is monlingual I can't think that a user would demand to have SB in different language(s) than its host but could it be a requirement for the other tools? d) Deploy a beta for one or two languages. The first multilang (multi==more than one) SB version. e) Establish the procedures for merge nondev->dev collaborations. I will need a procedure or guidelines for the results of our work. f) Publish a HowTo doc. It will be the guide for translation volunteers. By the way, is there a way to set a test environment for the SB Outlook addin? I would like to have more than one SB version installed without sharing the database. g) Ask for volunteers and coordinate their work. My plan is to have at least two translators per language for better QA. The exception of this rule will be Maori. ;-) Sure I'm leaving lots of tasks out. Regards, -Hern?n. From tameyer at ihug.co.nz Wed Oct 6 05:13:34 2004 From: tameyer at ihug.co.nz (Tony Meyer) Date: Wed Oct 6 05:13:45 2004 Subject: [spambayes-dev] I18N In-Reply-To: Message-ID: > My name is Hern?n Foffani and would like to collaborate > translating SpamBayes to Spanish while providing a general > framework for other different languages as well. Excellent! :) > I also offer myself to search for foreign language > volunteers and coordinate their work. And even more so :) [...] > a) Define the translation units. > > I would like to focus on small but complete units of > the application. > > 1. Outlook addin > 2. Windows tools > 3. Win32 bundle docs > 4. Win32 installer > 5. Other sp_* tools > 6. Website > > The order of the list reflects a personal preference > (I'm an Outlook user) not a technical reason. I would replace #2 with "sb_server". There aren't really any Windows tools other than the installer and the Outlook add-in (well, there's the tray application, but that would take almost no work and could be included in the sb_server work). Other than that, it looks good - including the ordering, I would say, although sb_server is likely to be an easier prospect than the Outlook plug-in. I would say that this should definitely be done against CVS Head - i.e. this will end up having a result for a 1.1.x release, not a 1.0.x one. We should also stick to interface changes rather than any changes necessary to work with languages where (eg) split-on-whitespace doesn't really work (there's a separate sf tracker about that). Tim's decreed that that sort of thing should be a branch, and his reasoning works for me. > b) Decide on the technical means (gettext or whatever) > > Ideally I'd rather use a dictionary based for the > string literals where the message database is outside > Spambayes' source code. Like gettext. It doesn't burden > developers too much and it doesn't break the application if > some language translation falls behind (it will happen.) >From the little I know (I haven't done any translation work before) gettext seems the appropriate choice for everything apart from (but maybe also including) the Outlook plug-in. > I understand that there's a problem with outlook's dialogs > which are RC based. I admit that I never used rcparse.py > before. Do you think that integrating rcparse.py with > gettext (or similar) can work? I think something could be done. The .rc files are generated by the MS tools - does anyone know what they do for internationalisation work? Following that would possibly be a good idea. Otherwise the scripts could probably be adapted to use (eg) gettext. This is the main reason that the Outlook plug-in will be much more complex than sb_server, however. It would be the first thing to figure out, I would think. > Is it possible to have a > mechanism that would allow the work of translators without > having them to build spambayes from source? > c) Delivery machinery. [...] I don't really know much about this. We can probably figure it out closer to a release time. > e) Establish the procedures for merge nondev->dev collaborations. > > I will need a procedure or guidelines for the results of our > work. Use the sourceforge system to submit patches against current CVS. You can assign them to me if you like (anadelonbrin). If it turns out that someone needs to be checking in things all the time, then we can organise check-ins rights and so on. > f) Publish a HowTo doc. > > It will be the guide for translation volunteers. This is a good idea. It could go in the README-DEVEL.txt file probably (along with the instructions for building releases, and so on). > By the way, is there a way to set a test environment for the > SB Outlook addin? I would like to have more than one SB version > installed without sharing the database. If you're running from source, then you can just run the addin.py script from a new checkout of the directory. To have a different data directory/configuration file, you can use the default_configuration.ini script technique described in the docs, I think. Just put one in each Outlook2000 directory pointing to the appropriate place. > g) Ask for volunteers and coordinate their work. > > My plan is to have at least two translators per language for > better QA. > The exception of this rule will be Maori. ;-) This is a nice goal, but I'm not sure whether it will work or not. I wouldn't want to turn anyone away just because they were the only person willing to do a language. =Tony Meyer From tameyer at ihug.co.nz Wed Oct 6 05:17:12 2004 From: tameyer at ihug.co.nz (Tony Meyer) Date: Wed Oct 6 05:18:51 2004 Subject: [spambayes-dev] button ideas (oh boy) In-Reply-To: Message-ID: > I've been playing around with the Spambayes menu graphic for > the up-and-coming version. I present to you my first comp, I > plan on making several more. If you look closely, I > watermarked some Bayesian modelling in the background. > > http://www.headlinesmarketing.com/temp/comp01.jpg To be honest, I don't really care about the graphic at all, but this does look quite nice. I would say that we probably should stick with the standard "Python Powered" logo, though: Apart from that, it looks ok to me. Does anyone else have an opinion? =Tony Meyer From tameyer at ihug.co.nz Wed Oct 6 05:25:17 2004 From: tameyer at ihug.co.nz (Tony Meyer) Date: Wed Oct 6 05:25:32 2004 Subject: [spambayes-dev] button ideas (oh boy) In-Reply-To: Message-ID: > Since I initiated this, I already had most of it on hand. I updated > the text in the docs to go along with the code changes, and I'll check > this in shortly. I've been using this since then, and it seems ok to me. It's taken a while to get used to shorter buttons, but it seems natural again now. The toolbar is certainly *much* smaller. > I'm not dealing yet with enabling based on individual message status, > but changing to disabling the buttons rather than hiding them is > extremely trivial. Do you think it's worth changing to disabling if we don't do it on a message by message basis? > I gave it a try and found one small problem with > it. We don't have any transparency in our icon images because of the > crazy way Outlook requires the transparency mask to be created, so the > disabled icon shows up as just a solid gray square. Not very > attractive. > > For a disabled icon, Outlook seems to gray out everything except > transparent or *white* pixels. I'm playing with some replacement > icons that use white symbols in the foreground (an X for Spam and a > checkmark for Not Spam). These also have backgrounds that fill the > entire icon square, thus eliminating the problems with the > non-transparent backgrounds around the current icons. The ones you made look nice, but I must admit that I'm quite attached to the little smiley face :) Can't we just mark the pixels around the smiley face as transparent? (I am very ignorant here). And maybe change the lines on the face from black to white? =Tony Meyer From ta-meyer at ihug.co.nz Wed Oct 6 05:50:07 2004 From: ta-meyer at ihug.co.nz (Tony Meyer) Date: Wed Oct 6 05:50:12 2004 Subject: [spambayes-dev] SpamBayes announce list & moderation Message-ID: According to spambayes-announce is moderated: """ The spambayes-announce list - a very low volume mailing list announcing new releases. This is a moderated mailing list: all postings to this list are subject to a human moderator's review before they're sent to the list. The moderator will reject off-topic messages. """ However, I don't think this is true. There were a couple of minorly off-topic messages a while back, and the mailman settings don't seem to say that it is (emergency moderation is off, and the moderation flag doesn't default to on, so they're all off). Am I wrong, and it is moderated? If not, should it be? (Tim's the one that wrote the above description, apart from the first sentence). =Tony Meyer From richie at entrian.com Wed Oct 6 09:14:21 2004 From: richie at entrian.com (Richie Hindle) Date: Wed Oct 6 09:14:23 2004 Subject: [spambayes-dev] I18N In-Reply-To: References:

Message-ID: <3j67m09242f6h0shdekvoh8kjbf3f27k15@4ax.com> [Tony] > The .rc files are generated by the MS > tools - does anyone know what they do for internationalisation work? They simply duplicate the whole file for each language. Windows isn't capable of laying out a dialog according to the text on it - all controls are absolutely positioned. That means that you need to manually expand all the controls to double their English size when you translate to German. 8-) You can't really edit the things in a text editor, but there may well be free .rc editors out there (nothing obvious from a 2-second Google, but there may be something). -- Richie Hindle richie@entrian.com From kennypitt at hotmail.com Wed Oct 6 15:48:24 2004 From: kennypitt at hotmail.com (Kenny Pitt) Date: Wed Oct 6 15:49:04 2004 Subject: [spambayes-dev] I18N In-Reply-To: <3j67m09242f6h0shdekvoh8kjbf3f27k15@4ax.com> Message-ID: Richie Hindle wrote: > [Tony] >> The .rc files are generated by the MS >> tools - does anyone know what they do for internationalisation work? > > They simply duplicate the whole file for each language. Windows > isn't capable of laying out a dialog according to the text on it - > all controls are absolutely positioned. That means that you need to > manually expand all the controls to double their English size when > you translate to German. 8-) Each resource has a language id associated with it. It's possible to put multiple language versions into the same resource file, and Windows will choose the closest match to the user's current language. It seems that the most common approach, however, is to compile all the resources into a separate resource DLL and then have a different installer for each language that installs the appropriate version of that language DLL. This is probably because the resources would get very bloated if they contained all the different languages in one file. This approach would lend itself to the language pack idea, I think. The default install could install the English resource DLL, and then each language pack could install an appropriate replacement DLL over the English version. Unfortunately, our current approach with rcparser.py does not utilize Windows resource DLL's at all. It parses the .RC file directly and turns it into a .py file containing the data needed to generate the dialogs at runtime. I haven't looked at the rcparser code so I don't know how it would handle the language identifiers or multiple languages in the same resource file. > You can't really edit the things in a text editor, but there may well > be free .rc editors out there (nothing obvious from a 2-second > Google, but there may be something). Well, you *can* edit them in a text editor, but it is a bit of a pain in the you-know-where. I'm aware of a free RC compiler or two, but not of a visual editor unfortunately. -- Kenny Pitt From kennypitt at hotmail.com Wed Oct 6 15:59:19 2004 From: kennypitt at hotmail.com (Kenny Pitt) Date: Wed Oct 6 16:00:05 2004 Subject: [spambayes-dev] button ideas (oh boy) In-Reply-To: Message-ID: Tony Meyer wrote: > To be honest, I don't really care about the graphic at all, but this > does look quite nice. I would say that we probably should stick with > the standard "Python Powered" logo, though: > > > > Apart from that, it looks ok to me. Does anyone else have an opinion? I definitely agree that we should stick with the standard "Python Powered" logo if we make any changes to the graphics. While the new graphic does look nice, I don't have any problem with the old one and I'm not sure there's a need to change the graphics just for the sake of changing them. On the other hand, if we had a consistent "branding" involving graphics and styling for the Web site, the Outlook add-in graphics, and the Web UI for sb_server and friends, then that might be worth more. -- Kenny Pitt From kennypitt at hotmail.com Wed Oct 6 16:12:56 2004 From: kennypitt at hotmail.com (Kenny Pitt) Date: Wed Oct 6 16:14:03 2004 Subject: [spambayes-dev] button ideas (oh boy) In-Reply-To: Message-ID: Tony Meyer wrote: > ... Can't we just mark the pixels around > the smiley face as transparent? I've done this before in C++, and unfortunately Microsoft made it a bit complicated. It involves creating two separate images: the original and a two-color mask version that defines which pixels are transparent and which aren't. You then have to register two special clipboard formats and copy both images to the clipboard using the appropriate formats. I took a look at doing this in Python a while back. After looking through the pywin32 source code for the clipboard handling several times, I was unable to figure out any way to get the bitmaps onto the clipboard in the required format. I think we would need some enhancements to pywin32 to be able to do this directly from Python. If I had the time, I could probably put together a custom .pyd extension library specifically for the purpose of handling the toolbar images. I have all the C++ code for creating the mask image and putting everything on the clipboard. I would just need to work out the Python integration bits, which is something I've never worked on before. The question is, do we think that effort would be worthwhile? If so, how would we incorporate the .pyd into our development process so that people can still run easily from source? -- Kenny Pitt From kennypitt at hotmail.com Wed Oct 6 16:33:28 2004 From: kennypitt at hotmail.com (Kenny Pitt) Date: Wed Oct 6 16:34:08 2004 Subject: [spambayes-dev] button ideas (oh boy) In-Reply-To: Message-ID: Tony Meyer wrote: >> I'm not dealing yet with enabling based on individual message status, >> but changing to disabling the buttons rather than hiding them is >> extremely trivial. > > Do you think it's worth changing to disabling if we don't do it on a > message by message basis? It would probably only be worthwhile from the standpoint of maybe being less confusing for users. We've gotten quite a few "the Recover from Spam button is missing" type questions in the past, and those might be easier to answer if the buttons are supposed to be there all the time. I haven't given up on the possibility of enabling on a message-by-message basis. It just requires more significant changes than simply replacing the current hide-it code with disable-it code, and I haven't had the time to look into it further. The support is obviously there to check the status of individual messages because we do it during training. I'm just not sure how easy it is to detect each change of message selection, or how much overhead it would be to check the status every time the user highlights a different message. There's also still the issue of what is the right thing to do if the user selects multiple messages. -- Kenny Pitt From hernan at orgmf.com.ar Wed Oct 6 17:59:22 2004 From: hernan at orgmf.com.ar (=?us-ascii?Q?Hernan_Martinez_Foffani?=) Date: Wed Oct 6 18:00:22 2004 Subject: [spambayes-dev] I18N In-Reply-To: Message-ID: > [standard DLL aproach deleted] > ... > Unfortunately, our current approach with rcparser.py does not utilize > Windows resource DLL's at all. It parses the .RC file directly and > turns it into a .py file containing the data needed to generate the > dialogs at runtime. I haven't looked at the rcparser code so I don't > know how it would handle the language identifiers or multiple > languages in the same resource file. Current rcparse.py skips lang entries. I will research a bit more but I'm still thinking that one interesting approach will be to have one dialogs.rc *and*, say, gettext db per language. We'll need some dictionary based message db because SB strings does not use Microsoft Resource string tables (that's wise ;-). I believe that the easiest way to implement those without losing portability is enhancing rcparse.py to treat any string in the rc file it reads as it were entries of the messages db. Doing so, a translator could work without touching the .rc file though the result may not be perfect. Later it can be enhanced by editing the resource file adjusting only the dialogs that looks bad or edit all of them from scratch. This "fallthrough" scheme will help people to translate SB the way it fits best to them while not burding developers (too) much. >> You can't really edit the things in a text editor, but there may well >> be free .rc editors out there (nothing obvious from a 2-second >> Google, but there may be something). > > Well, you *can* edit them in a text editor, but it is a bit of a pain > in the you-know-where. I'm aware of a free RC compiler or > two, but not of a visual editor unfortunately. I have found three: http://www.users.on.net/johnson/resourcehacker/ http://www.wilsonc.demon.co.uk/d7resourceexplorer.htm http://www.cs.virginia.edu/~lcc-win32/ (The last one is a C compiler that includes a visual resource editor in their distribution.) The first one reads .RES (rc compiled format) and saves .RES, the second one reads .RES and can save .RC, and the third one can read and write .RC file format. I'm evaluating them now. -Hernan. From hernan at orgmf.com.ar Wed Oct 6 19:08:08 2004 From: hernan at orgmf.com.ar (=?iso-8859-1?Q?Hern=E1n_Mart=EDnez_Foffani?=) Date: Wed Oct 6 19:09:11 2004 Subject: [spambayes-dev] I18N Message-ID: >> a) Define the translation units. >> >> I would like to focus on small but complete units of the >> application. >> >> 1. Outlook addin >> 2. Windows tools >> 3. Win32 bundle docs >> 4. Win32 installer >> 5. Other sp_* tools >> 6. Website >> >> The order of the list reflects a personal preference >> (I'm an Outlook user) not a technical reason. > > I would replace #2 with "sb_server". There aren't really any Windows > tools other than the installer and the Outlook add-in (well, there's > the tray application, but that would take almost no work and could be > included in the sb_server work). Other than that, it looks good - > including the ordering, I would say, although sb_server is likely to > be an easier prospect than the Outlook plug-in. Besides sb_server and sb_tray in my SB bin directory I also have sb_pop3dnd, sb_service, sb_upload and setup_server. Never used them though. (Regarding on which I should start, see below) > I would say that this should definitely be done against CVS Head - > i.e. this will end up having a result for a 1.1.x release, not a > 1.0.x one. We should also stick to interface changes rather than any > changes necessary to work with languages where (eg) > split-on-whitespace doesn't really work (there's a separate sf > tracker about that). Tim's decreed that that sort of thing should be > a branch, and his reasoning works for me. Sorry don't get it... Do you suggest to work (me) on a branch or to send patches against HEAD? For the time being I rather send patches to SF against cvs' head. If I see that you work too much and/or my code falls behind then I'd ask you to branch SB for I18N if possible. We can decide later who will merge it back. heh... Of course I will work *only* in the interface part of SB. >> I understand that there's a problem with outlook's dialogs >> which are RC based. I admit that I never used rcparse.py >> before. Do you think that integrating rcparse.py with >> gettext (or similar) can work? > > I think something could be done. The .rc files are generated by the > MS tools - does anyone know what they do for internationalisation > work? Following that would possibly be a good idea. Otherwise the > scripts could probably be adapted to use (eg) gettext. This is the > main reason that the Outlook plug-in will be much more complex than > sb_server, however. It would be the first thing to figure out, I > would think. I want to start with a kind of a proof-of-concept on the Outlook plugin. If I manage to translate a couple of dialogs and string literals then I'd feel more confortable to do the rest of the repetitive job. I wouldn't like to find that after patching sb_server for gettext the Outlook plugin can't use that approach. My goal is to build or find a framework for translators as small as possible. >> c) Delivery machinery. > [...] > > I don't really know much about this. We can probably figure it out > closer to a release time. OK. >> e) Establish the procedures for merge nondev->dev collaborations. >> >> I will need a procedure or guidelines for the results of our >> work. > > Use the sourceforge system to submit patches against current CVS. > You can assign them to me if you like (anadelonbrin). If it turns > out that someone needs to be checking in things all the time, then we > can organise check-ins rights and so on. Fine with me. >> By the way, is there a way to set a test environment for the >> SB Outlook addin? I would like to have more than one SB version >> installed without sharing the database. > > If you're running from source, then you can just run the addin.py > script from a new checkout of the directory. To have a different > data directory/configuration file, you can use the > default_configuration.ini script technique described in the docs, I > think. Just put one in each Outlook2000 directory pointing to the > appropriate place. Err.... Sorry. I think that what I really want is to share the db. But don't mind, I'll figure it out. Thanks. >> g) Ask for volunteers and coordinate their work. >> >> My plan is to have at least two translators per language for >> better QA. The exception of this rule will be Maori. ;-) > > This is a nice goal, but I'm not sure whether it will work or not. I > wouldn't want to turn anyone away just because they were the only > person willing to do a language. You're probably right. Now imagine someone sending me a message file containing say "skcus seyab maps" in it, and claiming to be the kamandarian translation... -Hern?n. From waynelist at data-trak.net Wed Oct 6 22:37:30 2004 From: waynelist at data-trak.net (Wayne Pedersen) Date: Wed Oct 6 22:37:33 2004 Subject: [spambayes-dev] SpamBayes Core Message-ID: <4164578A.2010006@data-trak.net> I am interested in including the SpamBayes engine in one of my applications. I'm not a Python expert and rather new to the language. I see many source files, but I am interested in the ones which actually identify the mail as spam or ham. The wiki didn't seem to offer much other than end user support. Where can I find some documentation on the core engine of SpamBayes? Thanks! Wayne P. From kennypitt at hotmail.com Wed Oct 6 23:15:19 2004 From: kennypitt at hotmail.com (Kenny Pitt) Date: Wed Oct 6 23:16:05 2004 Subject: [spambayes-dev] SpamBayes Core In-Reply-To: <4164578A.2010006@data-trak.net> Message-ID: Wayne Pedersen wrote: > I am interested in including the SpamBayes engine in one of my > applications. > > I'm not a Python expert and rather new to the language. > > I see many source files, but I am interested in the ones which > actually identify the mail as spam or ham. The wiki didn't seem to > offer much other than end user support. > > Where can I find some documentation on the core engine of SpamBayes? There really isn't any documentation on the source code other than the source code itself. Most of the base classifier stuff is pretty thoroughly commented, but level of commenting varies in other areas. It's really a bit complicated to define where to find all the source code you would need. All the source for the actual classifier engine is located in the "spambayes" subdirectory of the source, but there are a number of other source files mixed in there as well. In order to use the engine, you also need some sort of driver program that handles things like opening the correct training database files, obtaining the message to be classified, and training on incorrect messages. You will find examples of this scattered throughout the source tree, primarily in the "scripts" and "Outlook2000" subdirectories. A good place to start might be "scripts/sb_filter.py", which is probably the simplest version of the filter. It just reads messages from stdin or a file, classifies or trains them, and writes the results to stdout. -- Kenny Pitt From tameyer at ihug.co.nz Thu Oct 7 00:06:34 2004 From: tameyer at ihug.co.nz (Tony Meyer) Date: Thu Oct 7 00:06:43 2004 Subject: [spambayes-dev] SpamBayes Core In-Reply-To: Message-ID: > I see many source files, but I am interested in the ones > which actually identify the mail as spam or ham. In addition to what Kenny said: you probably just want classifier.py and tokenizer.py. If you use tokenizer.tokenize to convert a message to tokens, and then pass it through classifier's spamprob() function, you'll get a score. How much of the rest you need depends on what you want to do, exactly. You could try reading the README-DEVEL.txt file, too - IIRC it still has explanations of many of the eldest parts. > The wiki didn't seem to offer much other than end user support. You know the rule about finding something 'missing' on a wiki, I presume . =Tony Meyer From tameyer at ihug.co.nz Thu Oct 7 00:29:46 2004 From: tameyer at ihug.co.nz (Tony Meyer) Date: Thu Oct 7 00:29:52 2004 Subject: [spambayes-dev] I18N In-Reply-To: Message-ID: > Besides sb_server and sb_tray in my SB bin directory I also > have sb_pop3dnd, sb_service, sb_upload and setup_server. > Never used them though. sb_pop3dnd is incomplete (or alpha, to be generous). I have great plans for this, but lack the time to finish it off nicely at the moment. There's no point doing any translation work with it at the moment, and, IAC, if sb_server is translated pretty much all of sb_pop3dnd would be too. sb_service just registers sb_server as a Windows service. I believe every message that is provided by it is actually provided by Windows itself, so there's no need for any translation there. sb_upload is just a command line tool to use with sb_server. The only thing that could be translated is the docstring. Given how little used this is, it doesn't seem worth it, at least at first. This (and sb_pop3dnd) are also cross-platform, BTW. setup_server will be completed for the first 1.1, and will need translating. What it is meant to do is automatically configure SpamBayes & the user's mailer. It needs polishing and (ideally) a nice little Wizard like the Outlook plug-in has. No point translating this until it's finished, though. > Sorry don't get it... Do you suggest to work (me) on a branch > or to send patches against HEAD? For the time being I rather > send patches to SF against cvs' head. Yes, patches against CVS HEAD. There is a 1.0 branch, on which only bugfixes should be committed, and which matches the 1.0 release. This one shouldn't be touched. CVS HEAD will turn into the 1.1a1 release, though, and so can include new features like this, as long as they don't break anything. > If I see that you work > too much and/or my code falls behind then I'd ask you to > branch SB for I18N if possible. This has come at a good time, when there's a lot of testing ahead for CVS HEAD before it gets released, and there's a nice stable branch that people can fall back to. As such, as long as the changes don't get too massive, working on the trunk should be fine, IMO. If it gets too large then we can create a branch. > My goal is to build or find a framework for translators as > small as possible. Sounds good to me :) [multiple instances of SpamBayes code] > Err.... Sorry. I think that what I really want is to share > the db. In that case, you still just run the appropriate addin.py, but don't need to bother with the default_configuration.ini file. Whichever version is registered will pick up the same database & configuration. [multiple translaters] > You're probably right. Now imagine someone sending me a > message file containing say "skcus seyab maps" in it, and > claiming to be the kamandarian translation... This is open-source software, though, so people are free to check out whether any claims are true or not. And we have no liability if we say that there is a kamandarian translation and it's not correct. I'd rather trust people than turn them away... =Tony Meyer From tameyer at ihug.co.nz Thu Oct 7 00:43:03 2004 From: tameyer at ihug.co.nz (Tony Meyer) Date: Thu Oct 7 00:43:09 2004 Subject: [spambayes-dev] button ideas (oh boy) In-Reply-To: Message-ID: > I've done this before in C++, and unfortunately Microsoft > made it a bit complicated. ;) > I took a look at doing this in Python a while back. After > looking through the pywin32 source code for the clipboard > handling several times, I was unable to figure out any way > to get the bitmaps onto the clipboard in the required format. > I think we would need some enhancements to pywin32 to be > able to do this directly from Python. [...] > If so, how would we incorporate the .pyd into > our development process so that people can still run easily > from source? Ideally, you would submit a patch to pywin32, and Mark would check it into that. Then we just built the binary with the new version, and all is good. For source users, we begin by providing some (possibly not as nice) compatibility option (eg ugly icons), and eventually require whatever pywin32 build it is. A variety of things have already gone through this (a large advantage of having the primary pywin32 developer be the primary Outlook plug-in developer ). > The question is, do we think that effort would be worthwhile? *I* like the smileys. However, (a) I'm not the one that would be putting in this effort, and (b) no-one else has piped up for them, so maybe it's just me... :) =Tony Meyer From tameyer at ihug.co.nz Thu Oct 7 08:01:52 2004 From: tameyer at ihug.co.nz (Tony Meyer) Date: Thu Oct 7 08:02:00 2004 Subject: [spambayes-dev] I18N In-Reply-To: Message-ID: > I believe that the easiest way to implement those without losing > portability is enhancing rcparse.py to treat any string in the rc file > it reads as it were entries of the messages db. Doing so, a > translator could work without touching the .rc file though the result > may not be perfect. +1. I'm only minorly familiar with rcparse.py, but I believe this wouldn't be difficult to do. =Tony Meyer From tameyer at ihug.co.nz Thu Oct 7 08:03:52 2004 From: tameyer at ihug.co.nz (Tony Meyer) Date: Thu Oct 7 08:03:57 2004 Subject: [spambayes-dev] button ideas (oh boy) In-Reply-To: Message-ID: > It would probably only be worthwhile from the standpoint of > maybe being less confusing for users. We've gotten quite a few > "the Recover from Spam button is missing" type questions in > the past, and those might be easier to answer > if the buttons are supposed to be there all the time. I would agree with that. > There's also still the issue of what is the right thing to do > if the user selects multiple messages. I think this is the most difficult question (unless it turns out that it's just simply too resource-expensive). I (obviously) can't think of a good answer, either. =Tony Meyer From kirebrow at yahoo.com Thu Oct 7 14:48:01 2004 From: kirebrow at yahoo.com (Erik Brown) Date: Thu Oct 7 14:48:35 2004 Subject: [spambayes-dev] button ideas (oh boy) In-Reply-To: Message-ID: <20041007124832.FWCN1756.imf17aec.mail.bellsouth.net@seeker> If the team is keeping with the smiley face, I've made a comp of new and improved graphics. They are 16 x 16 pixel dimension and ready for export, that is, if you guys decide to not hide the icons. http://www.headlinesmarketing.com/temp/updated-smiley.gif Erik Brown -----Original Message----- From: spambayes-dev-bounces@python.org [mailto:spambayes-dev-bounces@python.org] On Behalf Of Tony Meyer Sent: Tuesday, October 05, 2004 11:25 PM To: 'Kenny Pitt' Cc: 'SpamBayes-dev Forum' Subject: RE: [spambayes-dev] button ideas (oh boy) > Since I initiated this, I already had most of it on hand. I updated > the text in the docs to go along with the code changes, and I'll check > this in shortly. I've been using this since then, and it seems ok to me. It's taken a while to get used to shorter buttons, but it seems natural again now. The toolbar is certainly *much* smaller. > I'm not dealing yet with enabling based on individual message status, > but changing to disabling the buttons rather than hiding them is > extremely trivial. Do you think it's worth changing to disabling if we don't do it on a message by message basis? > I gave it a try and found one small problem with > it. We don't have any transparency in our icon images because of the > crazy way Outlook requires the transparency mask to be created, so the > disabled icon shows up as just a solid gray square. Not very > attractive. > > For a disabled icon, Outlook seems to gray out everything except > transparent or *white* pixels. I'm playing with some replacement > icons that use white symbols in the foreground (an X for Spam and a > checkmark for Not Spam). These also have backgrounds that fill the > entire icon square, thus eliminating the problems with the > non-transparent backgrounds around the current icons. The ones you made look nice, but I must admit that I'm quite attached to the little smiley face :) Can't we just mark the pixels around the smiley face as transparent? (I am very ignorant here). And maybe change the lines on the face from black to white? =Tony Meyer _______________________________________________ spambayes-dev mailing list spambayes-dev@python.org http://mail.python.org/mailman/listinfo/spambayes-dev From kenny.pitt at gmail.com Thu Oct 7 16:10:13 2004 From: kenny.pitt at gmail.com (Kenny Pitt) Date: Thu Oct 7 16:10:16 2004 Subject: [spambayes-dev] button ideas (oh boy) In-Reply-To: <20041007124832.FWCN1756.imf17aec.mail.bellsouth.net@seeker> References: <20041007124832.FWCN1756.imf17aec.mail.bellsouth.net@seeker> Message-ID: <2a052b9904100707104bb06250@mail.gmail.com> Erik Brown wrote: > If the team is keeping with the smiley face, I've made a comp of new and > improved graphics. They are 16 x 16 pixel dimension and ready for export, > that is, if you guys decide to not hide the icons. > > http://www.headlinesmarketing.com/temp/updated-smiley.gif Those icons come directly from MSN Messenger, so I'm not sure if there would be any copyright implications. I think I'd be more comfortable sticking with something created by someone on the team, not copied from another application. -- Kenny Pitt From sethg at GoodmanAssociates.com Thu Oct 7 18:02:03 2004 From: sethg at GoodmanAssociates.com (Seth Goodman) Date: Thu Oct 7 18:06:59 2004 Subject: [spambayes-dev] button ideas (oh boy) In-Reply-To: Message-ID: > From: Tony Meyer [mailto:tameyer@ihug.co.nz] > Sent: Thursday, October 07, 2004 1:04 AM <...> > > There's also still the issue of what is the right thing to do > > if the user selects multiple messages. > > I think this is the most difficult question (unless it turns out that it's > just simply too resource-expensive). I (obviously) can't think of a good > answer, either. A simple-minded approach: if _any_ of the messages highlighted will result in some action by pressing the button, leave the button enabled. That is, if any of the selected message are either untrained or trained opposite to the meaning of button, leave the button enabled and attempt processing of all highlighted messages. Spambayes currently does the right thing on any message where the action does not make sense. A really simple-minded approach: always leave both buttons enabled in all menus. If a user highlights multiple messages and selects an action, do the action wherever it makes sense. If the action does not make sense on any of the messages, so be it. That is exactly what happens now if you move (with incremental training disabled) messages to the Unsure folder. Both buttons are enabled and you can make multiple selections and press either button. It seems to do the right thing all the time, that is, train on any untrained messages and untrain and retrain on any incorrectly trained messages. I think this would be a big improvement compared to what we have now. Disabling the buttons where it makes sense is a really user-friendly feature, but if the choice were between buttons that are always enabled in all folders and the status quo, I would vote for providing the buttons in all folders. It would be no more confusing than the present situation in the Unsure folder. No matter what you do there will be user questions. For context sensitive buttons, it will be, "why doesn't the button appear enabled when I select these messages". There might also be bugs to fix related to when the buttons are enabled. For "always on" buttons, the only question would probably be, "I pressed the Spam button, but the Spambayes manager doesn't show the number of trained spam as any different". I suggest there will be fewer of the latter, since most people probably don't look at the Spambayes manger all that often. You can also gauge this by how many people currently ask you that question for the Unsure folder. -- Seth Goodman From adam.walker at rbwconsulting.com Thu Oct 7 20:15:47 2004 From: adam.walker at rbwconsulting.com (Adam Walker) Date: Thu Oct 7 20:15:52 2004 Subject: [spambayes-dev] I18N In-Reply-To: References: Message-ID: Being the one who wrote rcparse.py, I give this a +1. The only free RC editor I found back when I wrote this were for win3.1 and very painful to use. Even the MS tools considered Mark and I to working on different dialogs because his were "EN_au" and mine were "EN_us". --Adam Walker On Oct 7, 2004, at 2:01 AM, Tony Meyer wrote: >> I believe that the easiest way to implement those without losing >> portability is enhancing rcparse.py to treat any string in the rc file >> it reads as it were entries of the messages db. Doing so, a >> translator could work without touching the .rc file though the result >> may not be perfect. > > +1. > > I'm only minorly familiar with rcparse.py, but I believe this wouldn't > be > difficult to do. > > =Tony Meyer > > _______________________________________________ > spambayes-dev mailing list > spambayes-dev@python.org > http://mail.python.org/mailman/listinfo/spambayes-dev -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2387 bytes Desc: not available Url : http://mail.python.org/pipermail/spambayes-dev/attachments/20041007/ba9698a4/smime.bin From hatukanezumi at users.sourceforge.net Fri Oct 8 04:51:15 2004 From: hatukanezumi at users.sourceforge.net (Hatuka*nezumi) Date: Fri Oct 8 04:55:10 2004 Subject: [spambayes-dev] I18N In-Reply-To: References: Message-ID: <20041008115115.0298089e.hatukanezumi@users.sourceforge.net> On Wed, 6 Oct 2004 17:59:22 +0200 Hernan Martinez Foffani wrote: > Current rcparse.py skips lang entries. rcparse.py also seems to skip code_page pragma. Most of western european requires code page 1252, and japanese requires code page 932, etc. --- nezumi From hernan at orgmf.com.ar Fri Oct 8 10:56:16 2004 From: hernan at orgmf.com.ar (=?us-ascii?Q?Hernan_Martinez_Foffani?=) Date: Fri Oct 8 10:57:13 2004 Subject: [spambayes-dev] I18N In-Reply-To: <20041008115115.0298089e.hatukanezumi@users.sourceforge.net> Message-ID: >> Current rcparse.py skips lang entries. > > rcparse.py also seems to skip code_page pragma. > > Most of western european requires code page 1252, and japanese > requires code page 932, etc. While I don't think we need to make rcparse.py fully compliant with Microsoft specifications (for instance, I would expect one and only one dialogs.rc file per language so lang entries are unimportant) you did raise a valid point. We may need a way to let rcparse.py knows the encoding of the strings in the rc file and that encoding must be compatible with gettext (or similar) and with the visual resource editors. There's one more task in the queue... -H. From tim.peters at gmail.com Fri Oct 8 23:28:59 2004 From: tim.peters at gmail.com (Tim Peters) Date: Fri Oct 8 23:29:11 2004 Subject: [spambayes-dev] SpamBayes announce list & moderation In-Reply-To: References: Message-ID: <1f7befae04100814286a37a921@mail.gmail.com> [Tony Meyer] wrote: > According to spambayes-announce is > moderated: > ... > Am I wrong, and it is moderated? If not, should it be? It should be. I don't know whether it was, or whether it is now, but I sure tried to make it moderated . At least everyone's "mod" bit is set now. From kennypitt at hotmail.com Wed Oct 13 22:08:25 2004 From: kennypitt at hotmail.com (Kenny Pitt) Date: Wed Oct 13 22:09:07 2004 Subject: [spambayes-dev] RE: [Spambayes] Need assistance. Trying to make an Outlook binaryinstall from source. In-Reply-To: <20041013171814.JHHP2348.imf20aec.mail.bellsouth.net@seeker> Message-ID: DUI-DWI wrote: > This is where I'm stuck however. The readme in the > Outlook2000/Installer directory says to execute the following to > create the 'dist' and 'build_spambayes' directories: We don't use the McMillan Installer to build the binary anymore, so it is probably way out of date with the current state of the source. Go to the windows/py2exe directory instead and run setup_all.py. Then go up to the windows directory and load spambayes.iss into InnoSetup to build the installer. Note that you'll need to run setup_all.py on a system that has Outlook 2000 installed or it won't find the type libraries it needs. FYI: Questions like this are best asked on the spambayes-dev@python.org list, which is intended for those who are modifying and rebuilding SpamBayes instead of just using it. While most of the developers follow both lists, things occasionally get lost in the clutter of the higher traffic on the spambayes list. -- Kenny Pitt From kirebrow at yahoo.com Thu Oct 14 18:12:32 2004 From: kirebrow at yahoo.com (Erik Brown) Date: Thu Oct 14 18:13:14 2004 Subject: [spambayes-dev] Any updated information for the plans to make better statistics? Message-ID: <20041014161309.BEC2444.imf19aec.mail.bellsouth.net@seeker> SpamBayes Team, Any updated information for the plans to make better statistics (in terms of persistence, presentation, and what's provided) especially for the Outlook addin? Really, all I'm looking for is a persistent stat area (much like the current per session one) but includes classification accuracy percentage which is calculated from the total messages processed and classification errors. I'm presuming that the errors would be the messages that were manually classified as good or spam and/or the false positives and false negatives? This way I can better test this wonderful tool and provide much more concrete test results when using the Outlook addin. In terms of the spam filtering accuracy when using different training methods and/or options for the SpamBayes engine. Any information on this would be greatly appreciated and thanks for all the hard work. Erik Brown -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/spambayes-dev/attachments/20041014/bc9a61c0/attachment.html From tameyer at ihug.co.nz Thu Oct 14 22:36:16 2004 From: tameyer at ihug.co.nz (Tony Meyer) Date: Thu Oct 14 22:36:49 2004 Subject: [spambayes-dev] Any updated information for the plans to makebetter statistics? In-Reply-To: Message-ID: > Any updated information for the plans to make better > statistics (in terms of persistence, presentation, and > what's provided) especially for the Outlook addin? I'm planning on adding persistence (like sb_server/sb_imapfilter already have) in the near future (it would certainly make it into 1.1a1); I've done some of the coding already. Presentation is more tricky - I presume that what would be nice are some graphs, but I don't have the skills to draw a dynamic graph in an Outlook dialog (if anyone provides example code to do this, then I'd happily tie in the SpamBayes stuff). > Really, all I'm looking for is a persistent stat area > (much like the current per session one) but includes > classification accuracy percentage which is calculated > from the total messages processed and classification > errors. I'm presuming that the errors would be the > messages that were manually classified as good or spam > and/or the false positives and false negatives? I'm not sure about adding an "accuracy percentage", since it's hard to define (what do you do with unsures?), and so might mislead people, since there presumably wouldn't be room to explain it in the dialog. You're welcome to try and convince me otherwise, of course; it would certainly be an easy addition. However, once the stats are persistent, it would be simple for you to calculate any such figure yourself, of course. Would that suffice? > This way I can better test this wonderful tool and provide > much more concrete test results when using the Outlook > addin. In terms of the spam filtering accuracy when using > different training methods and/or options for the SpamBayes > engine. If you want to do serious testing, then you'd be better off using the testing scripts that are in the source archive (timcv.py and incremental.py). These don't rely on you consistently doing the same thing, and you can easily test multiple options/regimes over the same mail set. There's a bit of a learning curve involved, but it's not too bad. =Tony Meyer --- Please always include the list (spambayes@python.org) in your replies (reply-all), and please don't send me personal mail about SpamBayes. This way, you get everyone's help, and avoid a lack of replies when I'm busy. From tameyer at ihug.co.nz Fri Oct 15 01:48:55 2004 From: tameyer at ihug.co.nz (Tony Meyer) Date: Fri Oct 15 01:49:04 2004 Subject: [spambayes-dev] Any updated information for the plans tomakebetter statistics? In-Reply-To: Message-ID: [me, earlier today] > I'm planning on adding persistence (like sb_server/sb_imapfilter > already have) in the near future (it would certainly make it into > 1.1a1); I've done some of the coding already. The message prompted me to finish this off (there wasn't much left). The CVS version of the plug-in now has persistent stats (they are saved in a wee pickle in the data directory). =Tony Meyer From kirebrow at yahoo.com Fri Oct 15 04:31:38 2004 From: kirebrow at yahoo.com (Erik Brown) Date: Fri Oct 15 04:32:23 2004 Subject: [spambayes-dev] Any updated information for the planstomakebetter statistics? In-Reply-To: Message-ID: <20041015023219.QBVU2361.imf16aec.mail.bellsouth.net@seeker> Tony, AWSOME!!!!!!!!!!!!!!!!!! I can't wait to test this, too bad the CVS is not real time... = ) I'll get back to you on your previous email concerning this, have to do my research. Erik Brown -----Original Message----- From: spambayes-dev-bounces@python.org [mailto:spambayes-dev-bounces@python.org] On Behalf Of Tony Meyer Sent: Thursday, October 14, 2004 7:49 PM To: 'Erik Brown' Cc: spambayes-dev@python.org Subject: RE: [spambayes-dev] Any updated information for the planstomakebetter statistics? [me, earlier today] > I'm planning on adding persistence (like sb_server/sb_imapfilter > already have) in the near future (it would certainly make it into > 1.1a1); I've done some of the coding already. The message prompted me to finish this off (there wasn't much left). The CVS version of the plug-in now has persistent stats (they are saved in a wee pickle in the data directory). =Tony Meyer _______________________________________________ spambayes-dev mailing list spambayes-dev@python.org http://mail.python.org/mailman/listinfo/spambayes-dev From tameyer at ihug.co.nz Fri Oct 15 07:11:08 2004 From: tameyer at ihug.co.nz (Tony Meyer) Date: Fri Oct 15 07:11:16 2004 Subject: [spambayes-dev] RE: [Spambayes] Need assistance. Trying to make anOutlook binaryinstall from source. In-Reply-To: Message-ID: > We don't use the McMillan Installer to build the binary > anymore, so it is probably way out of date with the current > state of the source. FWIW, I've removed the McMillan stuff from the wiki, since you can't even download versions as old as 008.1 anymore. =Tony Meyer From kenny.pitt at gmail.com Fri Oct 15 16:10:47 2004 From: kenny.pitt at gmail.com (Kenny Pitt) Date: Fri Oct 15 16:12:12 2004 Subject: [spambayes-dev] RE: [Spambayes] Need assistance. Trying to make anOutlook binaryinstall from source. In-Reply-To: References:

Message-ID: <2a052b9904101507104631aac1@mail.gmail.com> Tony Meyer wrote: > > We don't use the McMillan Installer to build the binary > > anymore, so it is probably way out of date with the current > > state of the source. > > FWIW, I've removed the McMillan stuff from the wiki, since you can't even > download versions as old as 008.1 anymore. Since we're not supporting this anymore, is there any reason not to remove the files in Outlook2000/installer from latest CVS? -- Kenny Pitt From kennypitt at hotmail.com Fri Oct 15 17:01:31 2004 From: kennypitt at hotmail.com (Kenny Pitt) Date: Fri Oct 15 17:02:04 2004 Subject: [spambayes-dev] Any updated information for the planstomakebetter statistics? In-Reply-To: Message-ID: Tony Meyer wrote: > [me, earlier today] >> I'm planning on adding persistence (like sb_server/sb_imapfilter >> already have) in the near future (it would certainly make it into >> 1.1a1); I've done some of the coding already. > > The message prompted me to finish this off (there wasn't much left). > The CVS version of the plug-in now has persistent stats (they are > saved in a wee pickle in the data directory). Don't know if it's related, but I don't think anything else has changed. I updated from CVS to get the stats changes and now I'm getting an exception from the addin. Here's the exception message: pythoncom error: Python error invoking COM method. Traceback (most recent call last): File "C:\Dev\Lang\Python23\Lib\site-packages\win32com\server\policy.py", line 283, in _Invoke_ return self._invoke_(dispid, lcid, wFlags, args) File "C:\Dev\Lang\Python23\Lib\site-packages\win32com\server\policy.py", line 288, in _invoke_ return S_OK, -1, self._invokeex_(dispid, lcid, wFlags, args, None, None) File "C:\Dev\Lang\Python23\Lib\site-packages\win32com\server\policy.py", line 616, in _invokeex_ return DesignatedWrapPolicy._invokeex_( self, dispid, lcid, wFlags, args, kwArgs, serviceProvider) File "C:\Dev\Lang\Python23\Lib\site-packages\win32com\server\policy.py", line 550, in _invokeex_ return func(*args) File "C:\src\python\spambayes\Outlook2000\addin.py", line 382, in OnItemAdd self.manager.LogDebug(2, "OnItemAdd event for folder", self.name, File "C:\Dev\Lang\Python23\Lib\site-packages\win32com\client\__init__.py", line 454, in __getattr__ raise AttributeError, "'%s' object has no attribute '%s'" % (repr(self), attr) exceptions.AttributeError: '' object has no attribute 'name' I'll look into it further if I get the chance, but wanted to make you aware of it. -- Kenny Pitt From tameyer at ihug.co.nz Sun Oct 17 00:55:01 2004 From: tameyer at ihug.co.nz (Tony Meyer) Date: Sun Oct 17 00:55:13 2004 Subject: [spambayes-dev] Any updated information for the planstomakebetter statistics? In-Reply-To: Message-ID: > Don't know if it's related, but I don't think anything else > has changed. I updated from CVS to get the stats changes and > now I'm getting an exception from the addin. Here's the > exception message: [...] > exceptions.AttributeError: ' 0x122933000>' object has no attribute 'name' Sorry, this was an unrelated change that I absent-mindedly checked in at the same time. (I copied over the stats stuff from the checkout that I use day-to-day to my main working copy to avoid checking in the Outlook tte stuff I'm playing around with - but that copy already had that change, which I forgot that I dumped after I figured it didn't work). Anyway, I've backed that bit out and it should work properly now. Apologies & thanks for pointing it out :) =Tony Meyer From tameyer at ihug.co.nz Sun Oct 17 00:58:22 2004 From: tameyer at ihug.co.nz (Tony Meyer) Date: Sun Oct 17 00:58:28 2004 Subject: [spambayes-dev] RE: [Spambayes] Need assistance. Trying to make anOutlook binaryinstall from source. In-Reply-To: Message-ID: > Since we're not supporting this anymore, is there any reason > not to remove the files in Outlook2000/installer from latest CVS? Good point; done. =Tony Meyer From kirebrow at yahoo.com Sun Oct 17 08:44:43 2004 From: kirebrow at yahoo.com (Erik Brown) Date: Sun Oct 17 08:45:20 2004 Subject: [spambayes-dev] Any updated information for the planstomakebetterstatistics? In-Reply-To: Message-ID: <20041017064514.OYBS2486.imf17aec.mail.bellsouth.net@seeker> I have been testing this and I am catching an error. It appears that it is not calculating the unsure percentage from session to session, even though the number of unsure messages is shown. Below is the log. I highlighted the two parts that show this discrepancy. --------------------------------------------- SpamBayes has processed 48 messages - 5 (10%) good, 42 (88%) spam and 1 (2%) unsure No messages were manually classified as good 1 message(s) were manually classified as spam (with 0 being false negatives) Addin terminating: 2 COM client and 2 COM servers exist. Loaded bayes database from 'C:\Documents and Settings\Erik M. Brown\Application Data\SpamBayes\default_bayes_database.db' Loaded message database from 'C:\Documents and Settings\Erik M. Brown\Application Data\SpamBayes\default_message_database.db' Bayes database initialized with 639 spam and 596 good messages SpamBayes Outlook Addin Version 1.0rc1 (May 2004) starting (with engine SpamBayes Engine Version 0.3 (January 2004)) on Windows 5.1.2600 (Service Pack 1) using Python 2.3.4 (#53, May 25 2004, 21:17:02) [MSC v.1200 32 bit (Intel)] Log created Sun Oct 17 02:24:41 2004 SpamBayes: Watching (for filtering) in 'Personal Folders/Inbox' Processing missed spam in folder 'Inbox' by starting a timer FAILED to add the toolbar item 'SpamBayesCommand.Manager' - (-2147352567, 'Exception occurred.', (0, None, None, None, 0, -2147467259), None) The above toolbar message is common - recreating the toolbar... Saving configuration -> C:\Documents and Settings\Erik M. Brown\Application Data\SpamBayes\Outlook.ini SpamBayes: Watching (for filtering) in 'Personal Folders/Inbox' Message 'Woww..8o-% 0ff Adaay' in 'Personal Folders/Inbox' had a Spam classification of 'Yes' Saving configuration -> C:\Documents and Settings\Erik M. Brown\Application Data\SpamBayes\Outlook.ini SpamBayes: Watching (for filtering) in 'Personal Folders/Inbox' SpamBayes - Disconnecting from Outlook Session: SpamBayes has processed 1 messages - 0 (0%) good, 1 (100%) spam and 0 (0%) unsure No messages were manually classified as good No messages were manually classified as spam Total: SpamBayes has processed 49 messages - 5 (10%) good, 43 (88%) spam and 1 (0%) unsure No messages were manually classified as good 1 message(s) were manually classified as spam (with 0 being false negatives) Addin terminating: 2 COM client and 2 COM servers exist. -------------------------------------------------- Erik Brown Webmaster for www.drunkdrivingdefense.com Webmaster for www.dui-dwi.com Email: kirebrow@yahoo.com -----Original Message----- From: spambayes-dev-bounces+brown=dui-dwi.com@python.org [mailto:spambayes-dev-bounces+brown=dui-dwi.com@python.org] On Behalf Of Tony Meyer Sent: Saturday, October 16, 2004 6:55 PM To: 'Kenny Pitt' Cc: spambayes-dev@python.org Subject: RE: [spambayes-dev] Any updated information for the planstomakebetterstatistics? > Don't know if it's related, but I don't think anything else > has changed. I updated from CVS to get the stats changes and > now I'm getting an exception from the addin. Here's the > exception message: [...] > exceptions.AttributeError: ' 0x122933000>' object has no attribute 'name' Sorry, this was an unrelated change that I absent-mindedly checked in at the same time. (I copied over the stats stuff from the checkout that I use day-to-day to my main working copy to avoid checking in the Outlook tte stuff I'm playing around with - but that copy already had that change, which I forgot that I dumped after I figured it didn't work). Anyway, I've backed that bit out and it should work properly now. Apologies & thanks for pointing it out :) =Tony Meyer _______________________________________________ spambayes-dev mailing list spambayes-dev@python.org http://mail.python.org/mailman/listinfo/spambayes-dev -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/spambayes-dev/attachments/20041017/cb189fb7/attachment.htm From kirebrow at yahoo.com Sun Oct 17 21:34:43 2004 From: kirebrow at yahoo.com (Erik Brown) Date: Sun Oct 17 21:35:31 2004 Subject: [spambayes-dev] Any updated information for the planstomakebetterstatistics? In-Reply-To: Message-ID: <20041017193525.WUSP2486.imf17aec.mail.bellsouth.net@seeker> I just deleted the pickle file, to restart the stat calculation and it seems it is doing the same thing with the unsure percentage. Log file below: ------------------------------- SpamBayes has processed 55 messages - 4 (7%) good, 49 (89%) spam and 2 (4%) unsure 1 message(s) were manually classified as good (with 0 being false positives) 1 message(s) were manually classified as spam (with 0 being false negatives) Addin terminating: 2 COM client and 2 COM servers exist. Loaded bayes database from 'C:\Documents and Settings\Erik M. Brown\Application Data\SpamBayes\default_bayes_database.db' Loaded message database from 'C:\Documents and Settings\Erik M. Brown\Application Data\SpamBayes\default_message_database.db' Bayes database initialized with 640 spam and 597 good messages SpamBayes Outlook Addin Version 1.0rc1 (May 2004) starting (with engine SpamBayes Engine Version 0.3 (January 2004)) on Windows 5.1.2600 (Service Pack 1) using Python 2.3.4 (#53, May 25 2004, 21:17:02) [MSC v.1200 32 bit (Intel)] Log created Sun Oct 17 15:30:38 2004 SpamBayes: Watching (for filtering) in 'Personal Folders/Inbox' Processing missed spam in folder 'Inbox' by starting a timer FAILED to add the toolbar item 'SpamBayesCommand.Manager' - (-2147352567, 'Exception occurred.', (0, None, None, None, 0, -2147467259), None) The above toolbar message is common - recreating the toolbar... Saving configuration -> C:\Documents and Settings\Erik M. Brown\Application Data\SpamBayes\Outlook.ini SpamBayes: Watching (for filtering) in 'Personal Folders/Inbox' SpamBayes - Disconnecting from Outlook Session: SpamBayes has processed zero messages Total: SpamBayes has processed 55 messages - 4 (7%) good, 49 (89%) spam and 2 (0%) unsure ---------------------------------- Erik Brown Webmaster for www.drunkdrivingdefense.com Webmaster for www.dui-dwi.com Email: kirebrow@yahoo.com -----Original Message----- From: spambayes-dev-bounces+brown=dui-dwi.com@python.org [mailto:spambayes-dev-bounces+brown=dui-dwi.com@python.org] On Behalf Of Tony Meyer Sent: Saturday, October 16, 2004 6:55 PM To: 'Kenny Pitt' Cc: spambayes-dev@python.org Subject: RE: [spambayes-dev] Any updated information for the planstomakebetterstatistics? > Don't know if it's related, but I don't think anything else > has changed. I updated from CVS to get the stats changes and > now I'm getting an exception from the addin. Here's the > exception message: [...] > exceptions.AttributeError: ' 0x122933000>' object has no attribute 'name' Sorry, this was an unrelated change that I absent-mindedly checked in at the same time. (I copied over the stats stuff from the checkout that I use day-to-day to my main working copy to avoid checking in the Outlook tte stuff I'm playing around with - but that copy already had that change, which I forgot that I dumped after I figured it didn't work). Anyway, I've backed that bit out and it should work properly now. Apologies & thanks for pointing it out :) =Tony Meyer _______________________________________________ spambayes-dev mailing list spambayes-dev@python.org http://mail.python.org/mailman/listinfo/spambayes-dev -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/spambayes-dev/attachments/20041017/792f93ad/attachment.htm From kirebrow at yahoo.com Sun Oct 17 22:24:58 2004 From: kirebrow at yahoo.com (Erik Brown) Date: Sun Oct 17 22:25:35 2004 Subject: [spambayes-dev] Any updated information for the plans tomakebetter statistics? In-Reply-To: Message-ID: <20041017202533.MBSJ2348.imf20aec.mail.bellsouth.net@seeker> > I'm not sure about adding an "accuracy percentage", since it's hard to > define (what do you do with unsures?), and so might mislead people, since > there presumably wouldn't be room to explain it in the dialog. You're > welcome to try and convince me otherwise, of course; it would certainly be > an easy addition. I originally wanted an accuracy percentage that was something like POPFile's. It may just be me, but seeing the percentage number (99.1) increase (or decrease) over time is just cool. = ) I think the way POPFile's stats currently works is that you have to reclassify "unclassified" mail before the stats take a hit. If you see any validation in including this number, all manually re-classified email would count as an error (in any folder: ham, spam, unsure). This way, if you get a message in the unsure folder, and it is a message that you would not want to train that would possibly taint the corpus, the accuracy percentage won't take a hit in these rare occasions. However, if this sounds like a bad idea, I can always subtract the unsure percentage from 100, then somehow figure in the false positives and false negatives if any. Would they be counted as unsure after you re-classify them btw? Erik brown From tameyer at ihug.co.nz Mon Oct 18 01:13:49 2004 From: tameyer at ihug.co.nz (Tony Meyer) Date: Mon Oct 18 01:13:56 2004 Subject: [spambayes-dev] Any updated information for the planstomakebetterstatistics? In-Reply-To: Message-ID: > I have been testing this and I am catching an error. > It appears that it is not calculating the unsure > percentage from session to session, even though the > number of unsure messages is shown. Thanks; fixed. =Tony Meyer From kirebrow at yahoo.com Tue Oct 19 11:07:02 2004 From: kirebrow at yahoo.com (Erik Brown) Date: Tue Oct 19 11:07:39 2004 Subject: [spambayes-dev] Any updated information for the planstomakebetterstatistics? In-Reply-To: Message-ID: <20041019090735.CDHS19872.imf17aec.mail.bellsouth.net@seeker> Tony, Is there any way you can add a decimal point to the stat percentages? = ) Erik Brown Webmaster for www.drunkdrivingdefense.com Webmaster for www.dui-dwi.com Email: kirebrow@yahoo.com -----Original Message----- From: Tony Meyer [mailto:tameyer@ihug.co.nz] Sent: Sunday, October 17, 2004 7:14 PM To: 'Erik Brown'; 'Kenny Pitt' Cc: spambayes-dev@python.org Subject: RE: [spambayes-dev] Any updated information for the planstomakebetterstatistics? > I have been testing this and I am catching an error. > It appears that it is not calculating the unsure > percentage from session to session, even though the > number of unsure messages is shown. Thanks; fixed. =Tony Meyer From tameyer at ihug.co.nz Wed Oct 20 02:05:26 2004 From: tameyer at ihug.co.nz (Tony Meyer) Date: Wed Oct 20 02:05:31 2004 Subject: [spambayes-dev] Any updated information for the planstomakebetterstatistics? In-Reply-To: Message-ID: > Is there any way you can add a decimal point to the stat > percentages? = ) Done. The oastats.Stats().GetStats() function now takes a "decimal_places" argument that does what you would expect. I've set it to default to 1, but you could modify that for your local copy if you wanted. =Tony Meyer From hernan at orgmf.com.ar Wed Oct 20 18:56:45 2004 From: hernan at orgmf.com.ar (=?iso-8859-1?Q?Hern=E1n_Mart=EDnez_Foffani?=) Date: Wed Oct 20 18:57:47 2004 Subject: [spambayes-dev] State of I18N Message-ID: I made some tests and managed to build a partially translated version of the plugin. Follows what I found so far. Please consider every paragraph I wrote ending with "What do you think?", "Is this the right approach?" or "Am I screwing up something?" kind of questions. GET THE USER LANGUAGE I'm using locale.getdefaultlocale() to get the user lang preference. I'll add the chance to let the user specify another language in his config file. A different approach would be to get Outlook's language-id version (how?) instead of the user's locale and convert it later to Python's RFC1766 format code. Languages are "stackable" so they can naturally fallback one over the other if their translations are missing or incomplete. PURE PYTHON LITERAL STRINGS By using gettext, I found that adding a small class and a couple of lines at the begining of manager.py:BayesManager.__init__() is enough to have the infrastructure for the python code part. I'll leave out of translation assert's and LogDebug comment strings. The rest would be tedious but easy to do. I'll follow Tony's advise and start the addin translation after sb_server but I want to define the translation tools first. Some things may require some rework though: - Multiline messages that use one print sentence per line. (Replace those by "xxx\r\n" "yyy\r\n" "zzz\r\n"?) - Those that mix literal strings with parameters concatenating them instead of using placeholders. - The special case of mix literals with HTML. (see addin.py:ShowClues) - There are literals words around that might be used for program flow (eg: "ham","spam",etc.) Not sure about it. Anyway, no big surprises. gettext seems good for several reasons: - python officially blessed it, - python dist has (almost) all the tools a translator needs, - minimizes changes on source code, - includes fallbacks to language/country and - I like it. DIRECTORIES The gettext library uses {SomeDir}/langcode/LC_MESSAGES/message_file layout. In the sources I'm working on I made {SomeDir} as "Languages" parallel to Outlook2000, pspam and scripts directories. Doing so I can place all the translated messages files in one directory and manager.py&co can reach them at ../Languages. Is it too hard to test this layout for a frozen distribution? DIALOGS These were tricky. Binary dist SB imports them directly while the source addin compiles them first. I don't think its necessary to reproduce the same thing for foreign dialog.rc's. A translator can run rc2py at the command prompt to create the corresponding .py file. To imitate the fallback behaviour for dialogs I had to play a bit with sys.path. Setting a language, besides initializing gettext, appends __file__/../Languages/langcode/DIALOGS and __file__/../Languages/langcode[0:2]/DIALOGS to sys.path. (Actually it's not __file__ but this_filename and it's not [0:2] but the first "_" split.) Say the locale "es_ES", then SB will try to import "i18n_dialog.py" from "Languages/es_ES/DIALOGS" then from "Languages/es/DIALOGS". If it fails it fallbacks to the current code. GETTEXTED DIALOGS I mentioned before that I would like to make rcparser.py gettext aware so a translator could work out a solution without having to edit the resource dialogs. The problem I found is that the output of the parser is a dict that includes string literals subject to be translated (like caption or labels) and literals that do not (like font names.) Once in the dict there's no way to differentiate between them. As I expect that changes in the dict would imply changes all over SB, I thought of a partial solution (call it a hack) that may work. AFAICT, SB does not use the output of rcparse directly but through a rc2py previous process. So how about a subclass of str that override __repr__ ? Then, rcparse only needs to create instances of this class for labels, captions, etc., not for font names (or every literal can be of this new type and an attribute flag can drive __repr__ accordingly.) Later, rc2py and imports of the dialogs .py would do the right thing. Something like this: class gt_str(str): def __repr__(self): return "_(" + super(gt_str, self).__repr__() + ")" >>> a="a" >>> b=gt_str("a") >>> a==b True >>> a 'a' >>> b _('a') >>> I said it's a partial solution because the dict that rcparse generates does not equal (items of different types) the one obtained by importing the dialog. Does it matter? (It's also "partial" because I haven't tested it yet. heh..) TRANSLATOR TOOLS With this approach a translator would need: - Outlook with SB (binary dist is enough) - python (for the gettext tools) - the rcparse/rc2py tools also highly recommended: - a free resource editor and optionally: - SB source distro (i think it's no needed, really) To translate: - the "all_the_messages_in_SB" file (I can provide it for each new version of SB) - the dialogs.rc file for each new version of SB. Still, I have to search for a tool or procedure to let a translator knows which messages are not translated yet. If you know of a smart .po (gettext messages files) comparition tools, tell me please. That's all for the time being. Soon, I'm going to polish the changes I made and load the corresponding patches to sf.net for your review. Regards, -Hern?n. PS: Please, bare my english. From hernan at orgmf.com.ar Wed Oct 20 20:23:25 2004 From: hernan at orgmf.com.ar (=?iso-8859-1?Q?Hern=E1n_Mart=EDnez_Foffani?=) Date: Wed Oct 20 20:24:26 2004 Subject: [spambayes-dev] RE: State of I18N In-Reply-To: Message-ID: I forgot to tell you that I'm binding _ in the builtins namespace (through gettext.install()) While it's handy I don't know if that's ok for you. [Regarding subclassing str] > > class gt_str(str): > def __repr__(self): > return "_(" + super(gt_str, self).__repr__() + ")" > > I said it's a partial solution because the dict that > rcparse generates does not equal (items of different types) > the one obtained by importing the dialog. Does it matter? > (It's also "partial" because I haven't tested it yet. heh..) Tested. A minor change was needed to avoid _() on empty strings. It seems to work ok. -H. From kennypitt at hotmail.com Wed Oct 20 20:51:19 2004 From: kennypitt at hotmail.com (Kenny Pitt) Date: Wed Oct 20 20:52:04 2004 Subject: [spambayes-dev] State of I18N In-Reply-To: Message-ID: Hern?n Mart?nez Foffani wrote: > I mentioned before that I would like to make rcparser.py gettext > aware so a translator could work out a solution without having to > edit the resource dialogs. > > The problem I found is that the output of the parser is a dict that > includes string literals subject to be translated (like caption or > labels) and literals that do not (like font names.) Once in the dict > there's no way to differentiate between them. > As I expect that changes in the dict would imply changes all over SB, > I thought of a partial solution (call it a hack) that may work. > > AFAICT, SB does not use the output of rcparse directly but through a > rc2py previous process. So how about a subclass of str that override > __repr__ ? Then, rcparse only needs to create instances of this > class for labels, captions, etc., not for font names (or every > literal can be of this new type and an attribute flag can drive > __repr__ accordingly.) I'm not that familiar with gettext, so I may be way off base here. However, if you can make the parser smart enough to use different classes for translatable vs. non-translatable strings, wouldn't it be possible to just put the _() around the translatable strings and leave it off for non-translatable ones? Once the strings are written out to dialogs.py with the correct markers, you could run the pygettext utility against it to extract the appropriate strings for translation. -- Kenny Pitt From hernan at orgmf.com.ar Thu Oct 21 00:06:40 2004 From: hernan at orgmf.com.ar (=?iso-8859-1?Q?Hernan_Mart=EDnez_Foffani?=) Date: Thu Oct 21 00:06:38 2004 Subject: [spambayes-dev] State of I18N In-Reply-To: Message-ID: [Kenny Pitt] > Hern?n Mart?nez Foffani wrote: >> I mentioned before that I would like to make rcparser.py gettext >> aware so a translator could work out a solution without having to >> edit the resource dialogs. >> >> The problem I found is that the output of the parser is a dict that >> includes string literals subject to be translated (like caption or >> labels) and literals that do not (like font names.) Once in the dict >> there's no way to differentiate between them. >> As I expect that changes in the dict would imply changes all over SB, >> I thought of a partial solution (call it a hack) that may work. >> >> AFAICT, SB does not use the output of rcparse directly but through a >> rc2py previous process. So how about a subclass of str that override >> __repr__ ? Then, rcparse only needs to create instances of this >> class for labels, captions, etc., not for font names (or every >> literal can be of this new type and an attribute flag can drive >> __repr__ accordingly.) > > I'm not that familiar with gettext, so I may be way off base here. Sorry, I should have explained it a bit first. But you're not off base here. To make it short: gettext let you initiliaze a specific language environment so later, in your program, every call to _("A message") will be translated on the fly through to the corresponding table mapping "A message" -> "Un mensaje". gettext is specially suited if you have a program full of literal strings and you want to i18n it. All you'd have to do is set the environment (a couple of lines of code) and surround every literal string of your code with _(). The "_" is just a binding to a gettext function name. The task that rest is for translators to build the table mapping (a text file that could be released separated from the program.) > However, if you can make the parser smart enough to use different > classes for translatable vs. non-translatable strings, wouldn't it be > possible to just put the _() around the translatable strings and > leave it off for non-translatable ones? Actually, that's what I'm proposing but with a little twist. The type of the non-translatable string will be the same as it is now and the type of the translatable strings will be a subclass of str (unicode?). These are the ones that I need to have __repr__ overriden. I could implement a standard root class for the translatable strings, something like: class translatable_str: def __init__(self, s): self.s = s def __repr__(self): return "_(" + repr(self.s) + ")" but doing so the dict that rcparse produces wouldn't have instances of str for the translatable items. It would may impossible to use that dict directly for win32all dialogs without dumping it first to a .py file but I don't know if that is a problem. BTW, I wanted to override (or implement __repr__) because rc2py just call repr() on the dict to dump it and create the dialogs.py file. > Once the strings are written > out to dialogs.py with the correct markers, you could run the > pygettext utility against it to extract the appropriate strings for > translation. Yes. That's the idea. -H. From tameyer at ihug.co.nz Thu Oct 21 05:42:00 2004 From: tameyer at ihug.co.nz (Tony Meyer) Date: Thu Oct 21 05:42:12 2004 Subject: [spambayes-dev] State of I18N In-Reply-To: Message-ID: > I'm using locale.getdefaultlocale() to get the user lang > preference. I'll add the chance to let the user specify > another language in his config file. > > A different approach would be to get Outlook's language-id > version (how?) instead of the user's locale and convert it > later to Python's RFC1766 format code. I'd stick with the former, so that this naturally carries past the Outlook plug-in into the other scripts. Something like a [globals] language option, that allows multiple string values (a list of preferred languages in order). I can add that if you like. I would probably have the default (in Options.py) be "English", and then have a page on the configuration wizard(s) that lets you select the preferred language(s), and have that default to locale.getdefaultlocale() (or maybe, for Outlook, the Outlook language). > I'll leave out of translation assert's and LogDebug comment > strings. Yes, there isn't much point in translating those. In fact, it could make things harder, since I wouldn't want to have to try to figure out someone's problem given a log in (eg) Spanish. > Some things may require some rework though: > - Multiline messages that use one print sentence per line. > (Replace those by "xxx\r\n" "yyy\r\n" "zzz\r\n"?) Is this to make the translating easier so you have more context, or for some other reason? > - Those that mix literal strings with parameters concatenating > them instead of using placeholders. You mean "string" + var + "string" instead of "string %s string" % (var,)? I'd much rather stick with the latter, if possible. > - The special case of mix literals with HTML. > (see addin.py:ShowClues) This is the only case of that (other than the whole of the web interface, which is in ui.html), IIRC. We can probably figure out something to do there. > - There are literals words around that might be used for > program flow (eg: "ham","spam",etc.) Not sure about it. Ignore those, and I'll check them in the patch. There shouldn't be - for example, "ham", "spam" and "unsure" are all configurable options, so they ought to be referred to via the options class. (This might not always be the case with the Outlook plug-in, but I can fix that if necessary). > DIRECTORIES > > The gettext library uses {SomeDir}/langcode/LC_MESSAGES/message_file > layout. In the sources I'm working on I made {SomeDir} as > "Languages" parallel to Outlook2000, pspam and scripts directories. "languages" in that directory sounds fine to me for the source. > Doing so I can place all the translated messages files in one > directory and manager.py&co can reach them at ../Languages. > > Is it too hard to test this layout for a frozen distribution? We can setup the binary however we like. At the moment, in the application directory there is a "bin" directory and a "lib" directory. I suppose "languages" ought to go in the top level or in "lib" - the top level would match the source, so it might as well be that, I guess. > To imitate the fallback behaviour for dialogs I had to play > a bit with sys.path. Setting a language, besides > initializing gettext, appends > __file__/../Languages/langcode/DIALOGS and > __file__/../Languages/langcode[0:2]/DIALOGS to sys.path. > (Actually it's not __file__ but this_filename and it's not > [0:2] but the first "_" split.) > > Say the locale "es_ES", then SB will try to import > "i18n_dialog.py" from "Languages/es_ES/DIALOGS" then from > "Languages/es/DIALOGS". If it fails it fallbacks to the current code. Rather than using __file__ (which is what is in this_filename?), use manager.application_directory (or self.application_directory if this is in the Manager class). (But I can make this change to any patch, anyway). > GETTEXTED DIALOGS [...] This sounds workable to me, but I'd have to see it in action to really get an idea of what is happening (I'm only vaguely familiar with rcparse anyway). =Tony Meyer From hatukanezumi at users.sourceforge.net Thu Oct 21 06:33:49 2004 From: hatukanezumi at users.sourceforge.net (Hatuka*nezumi) Date: Thu Oct 21 06:39:36 2004 Subject: [spambayes-dev] State of I18N In-Reply-To: References: Message-ID: <20041021133349.1bcfe5c1.hatukanezumi@users.sourceforge.net> On Wed, 20 Oct 2004 18:56:45 +0200 > GETTEXTED DIALOGS > > I mentioned before that I would like to make rcparser.py > gettext aware so a translator could work out a solution without > having to edit the resource dialogs. > > The problem I found is that the output of the parser is a dict > that includes string literals subject to be translated (like > caption or labels) and literals that do not (like font names.) > Once in the dict there's no way to differentiate between them. > As I expect that changes in the dict would imply changes all > over SB, I thought of a partial solution (call it a hack) > that may work. Current code for sb_server HTTP interface doesn't allow non-ASCII output. UTF-8 will be allowed by adding option accept_utf8 to xmllib.XMLParser.__init__() in PyMeldLite.py. So (at least internally) UTF-8 must be used for character set of translated literals on HTTP interface. On the other hand, leterals in dialogs, menu items etc. on Outlook plugin need to use ANSI codepage, e.g. CP1252, CP932 etc. I suggest all of variable strings for Outlook plugin are contained in .rc file as STRING resources (if rcparser.py could support STRING resource). --- nezumi From nezumi at jca.apc.org Thu Oct 21 07:19:31 2004 From: nezumi at jca.apc.org (Hatuka*nezumi - IKEDA Soji) Date: Thu Oct 21 07:25:14 2004 Subject: [spambayes-dev] State of I18N In-Reply-To: <20041021133349.1bcfe5c1.hatukanezumi@users.sourceforge.net> References: <20041021133349.1bcfe5c1.hatukanezumi@users.sourceforge.net> Message-ID: <20041021141931.6693cd88.nezumi@jca.apc.org> On Thu, 21 Oct 2004 13:33:49 +0900 Hatuka*nezumi wrote: > Current code for sb_server HTTP interface doesn't allow > non-ASCII output. I was partially wrong. Most of single-byte character sets may be allowed, But some singlebyte/multibye character sets (codepages) are not allowed since they contain bytes not match '[\t\r\n -\176\240-\377]' which are not allowed by xmllib. --- nezumi From hatukanezumi at users.sourceforge.net Thu Oct 21 07:20:39 2004 From: hatukanezumi at users.sourceforge.net (Hatuka*nezumi) Date: Thu Oct 21 07:26:20 2004 Subject: [spambayes-dev] State of I18N In-Reply-To: <20041021133349.1bcfe5c1.hatukanezumi@users.sourceforge.net> References: <20041021133349.1bcfe5c1.hatukanezumi@users.sourceforge.net> Message-ID: <20041021142039.46faf080.hatukanezumi@users.sourceforge.net> On Thu, 21 Oct 2004 13:33:49 +0900 Hatuka*nezumi wrote: > Current code for sb_server HTTP interface doesn't allow > non-ASCII output. I was partially wrong. Most of single-byte character sets may be allowed, But some singlebyte/multibye character sets (codepages) are not allowed since they contain bytes not match '[\t\r\n -\176\240-\377]' which are not allowed by xmllib. --- nezumi From hernan at orgmf.com.ar Thu Oct 21 08:19:09 2004 From: hernan at orgmf.com.ar (=?iso-8859-1?Q?Hernan_Mart=EDnez_Foffani?=) Date: Thu Oct 21 08:19:06 2004 Subject: [spambayes-dev] State of I18N In-Reply-To: Message-ID: >> Some things may require some rework though: >> - Multiline messages that use one print sentence per line. >> (Replace those by "xxx\r\n" "yyy\r\n" "zzz\r\n"?) > > Is this to make the translating easier so you have more context, or > for some other reason? I would adjetive "posible" instead of "easier". While editing the messages file the translator can't figure out without any doubt that a collection of subsecuent portions of strings have a meaning. >> - Those that mix literal strings with parameters concatenating >> them instead of using placeholders. > > You mean "string" + var + "string" instead of "string %s string" % > (var,)? I'd much rather stick with the latter, if possible. I meant the latter too. I was enumerating the possible problem only. >> DIRECTORIES > ... > We can setup the binary however we like. At the moment, in the > application directory there is a "bin" directory and a "lib" > directory. I suppose "languages" ought to go in the top level or in > "lib" - the top level would match the source, so it might as well be > that, I guess. Fine. I stick with "language" at top level. >> To imitate the fallback behaviour for dialogs I had to play >> a bit with sys.path. Setting a language, besides >> initializing gettext, appends >> __file__/../Languages/langcode/DIALOGS and >> __file__/../Languages/langcode[0:2]/DIALOGS to sys.path. >> (Actually it's not __file__ but this_filename and it's not >> [0:2] but the first "_" split.) >> >> Say the locale "es_ES", then SB will try to import >> "i18n_dialog.py" from "Languages/es_ES/DIALOGS" then from >> "Languages/es/DIALOGS". If it fails it fallbacks to the current code. > > Rather than using __file__ (which is what is in this_filename?), use > manager.application_directory (or self.application_directory if this > is in the Manager class). (But I can make this change to any patch, > anyway). this_filename has either __file__ if it is from source or the path to binary dll on a frozen distribution. Didn't see manager.application_directory, I'll check it. But take in account that I need to register the translation very early to catch (and translate) as many messages as possible. >> GETTEXTED DIALOGS > [...] > > This sounds workable to me, but I'd have to see it in action to > really get an idea of what is happening (I'm only vaguely familiar > with rcparse anyway). I'll upload the patches later so you can see that better. Thanks, -H. From hernan at orgmf.com.ar Thu Oct 21 12:56:24 2004 From: hernan at orgmf.com.ar (=?iso-8859-1?Q?Hern=E1n_Mart=EDnez_Foffani?=) Date: Thu Oct 21 12:57:27 2004 Subject: [spambayes-dev] State of I18N In-Reply-To: Message-ID: [edited] > Current code for sb_server HTTP interface doesn't allow > some singlebyte/multibye [. Their] character sets > (codepages) are not allowed since they contain bytes > [that do] not match > '[\t\r\n -\176\240-\377]' which are not allowed by xmllib. > > UTF-8 will be allowed by adding option > accept_utf8 to xmllib.XMLParser.__init__() in PyMeldLite.py. > So (at least internally) UTF-8 must be used for character set of > translated literals on HTTP interface. Thanks for the info. I guess I have to work on that a bit more. > On the other hand, leterals in dialogs, menu items etc. on > Outlook plugin need to use ANSI codepage, e.g. CP1252, CP932 > etc. > > I suggest all of variable strings for Outlook plugin are > contained in .rc file as STRING resources (if rcparser.py > could support STRING resource). It doesn't :( rcparser supports neither code_page pragma nor STRING resources right now. I'm thinking aloud here... If rcparser.py grows to support STRING_TABLE that wouldn't solve the hole problem, right? Because in a graphical resource editor you can't map captions and labels to STRING_TABLE entries. Let's suppose that rcparser can detect the resource code_page. A translator could draw the dialogs with their captions and labels in a resource editor of her choice. If this editor writes the correct code_page in the new dialogs.rc file the parser could pick up the pragma and decode the strings accordingly. The next step, dumping the dict to a .py by rc2py, would need to mark that .py with the corresponding encoding declaration or dump it in utf-8. During runtime SB would import such .py with the correct encoding. The rest of the strings to be translated (the string literals in SB python source code) goes through gettext. The translator should specify in the messages table file which encoding she's using. I guess that it may even differ from the one used in dialogs.rc Could this scheme work? -H. From hatukanezumi at users.sourceforge.net Fri Oct 22 07:28:11 2004 From: hatukanezumi at users.sourceforge.net (Hatuka*nezumi) Date: Fri Oct 22 07:34:05 2004 Subject: [spambayes-dev] State of I18N In-Reply-To: References: Message-ID: <20041022142811.5e7c38fb.hatukanezumi@users.sourceforge.net> On Thu, 21 Oct 2004 12:56:24 +0200 <> > Let's suppose that rcparser can detect the resource > code_page. A translator could draw the dialogs with their > captions and labels in a resource editor of her choice. If > this editor writes the correct code_page in the new dialogs.rc > file the parser could pick up the pragma and decode the strings > accordingly. The next step, dumping the dict to a .py by rc2py, > would need to mark that .py with the corresponding encoding > declaration or dump it in utf-8. During runtime SB would > import such .py with the correct encoding. > > The rest of the strings to be translated (the string literals > in SB python source code) goes through gettext. The translator > should specify in the messages table file which encoding she's > using. I guess that it may even differ from the one used in > dialogs.rc My posts are a bit confusing, sorry. If rcparser.py supports code_page pragma, I think your approach will work. > Could this scheme work? I agree. --- nezumi From ta-meyer at ihug.co.nz Fri Oct 22 07:52:39 2004 From: ta-meyer at ihug.co.nz (Tony Meyer) Date: Fri Oct 22 07:52:44 2004 Subject: [spambayes-dev] Windows Installer not offering to add startup icon Message-ID: It appears that our installers aren't offering to create a startup icon for sb_server users as it should be. I've checked 1.0a9, 1.0rc1, 1.0rc2 and 1.0 and they all skip it. (I'm fairly impressed that sb_server users are clever enough to do it themselves!). The problem is the "Check: InstallingProxy" line in the Inno script. Although the selection code has been run by the time the tasks are offered, it still has the default (False) value. I've played around with the code, but can't figure a way around this (although my Pascal is extremely rusty). We can fix it by removing that Check. Outlook users don't get that page anyway, so they won't see the option (this is what happens with the desktop icon). However, since we want it checked by default, it will appear in the text box of additional tasks, even though it doesn't happen. Does this matter? I'm doubtful that anyone reads that box anyway, so my vote is no. If no-one complains, then I'll check the change in. =Tony Meyer P.s. If anyone wants to do a sanity check for me and run any of the installers and see if it gets offered, that would be great ;) From hernan at orgmf.com.ar Fri Oct 22 11:05:56 2004 From: hernan at orgmf.com.ar (=?us-ascii?Q?Hernan_Martinez_Foffani?=) Date: Fri Oct 22 11:06:59 2004 Subject: [spambayes-dev] State of I18N In-Reply-To: <20041022142811.5e7c38fb.hatukanezumi@users.sourceforge.net> Message-ID: >> Let's suppose that rcparser can detect the resource >> code_page. A translator could draw the dialogs with their >> captions and labels in a resource editor of her choice. If >> this editor writes the correct code_page in the new dialogs.rc >> file the parser could pick up the pragma and decode the strings >> accordingly. The next step, dumping the dict to a .py by rc2py, >> would need to mark that .py with the corresponding encoding >> declaration or dump it in utf-8. During runtime SB would >> import such .py with the correct encoding. >> ... > > My posts are a bit confusing, sorry. If rcparser.py supports > code_page pragma, I think your approach will work. Ok. I'll schedule adding support for code_page as a task for myself. Thanks for your second opinion, Nezumi. -H. From mwm at aaahawk.com Wed Oct 27 13:23:42 2004 From: mwm at aaahawk.com (mwm@aaahawk.com) Date: Wed Oct 27 13:23:44 2004 Subject: [spambayes-dev] need help Message-ID: <417f853e.15e.930.12231@aaahawk.com> I down loaded spambayes but wasn't savvy enough to install it properly. I unistalled it. Now Outlook Express "can't find my smtp server". I can get onto the net ok and can use webmail from my ISP but always get the above error when trying to receive or send mail. Thanks From harrysigerson at ntlworld.com Sat Oct 30 15:47:17 2004 From: harrysigerson at ntlworld.com (Harry Sigerson) Date: Sat Oct 30 15:47:13 2004 Subject: [spambayes-dev] A query on Frequently Asked Question 4.6 Message-ID: Dear SpamBayes, A query on FAQ 4.6 ================ 4.6 Do I need to keep spam after it has been trained? If so, for how long? Once a message has been [correctly] trained there is no need to keep it around. However, SpamBayes' accuracy is dependent upon having a "sufficient" sample from which to make its decisions. Therefore, most users retain a fair amount of spam in the event that they may wish to rebuild the corpus from scratch. Of course, this begs the question: "how much is enough?" That is where the "art" of SpamBayes meets the science. Some users keep as many as several thousand [recent] spam (as well as a similar number of ham). That is not to say that you won't have excellent results with a tenth (or less) of that number; since everyone's e-mail profile is different, the requirements for training are as well. ================ The FAQ item 4.6 above addressed something that I've wondered about since I got SB going around July this year, that is, "Where does all the 'saved' spam go?" . I do have a lot of spam now, 16,054 spams and 1,725 hams. The thing is where do I go to cut down on the amount of spam? I see the comment, "Warning: you have much more spam than ham - SpamBayes works best with approximately even numbers of ham and spam." in the 'Status and Configuration' section. I'll have another look and see if I can figure out where all the spam is located... ================ 4.3 How do I train SpamBayes (forward/bounce method)? Alternatively, when you receive an incorrectly classified message, you can forward it to the SMTP proxy for training. If the message should have been classified as spam, forward or bounce the message to spambayes_spam@localhost, and if the message should have been classified as ham, forward it to spambayes_ham@localhost. You can still review the training through the web interface, if you wish to do so. You should ensure that the "lookup message in cache" option is set to True/Yes before you use this. Note that some mail clients (particularly Outlook Express) do not forward all headers when you bounce, forward or redirect mail. We do not recommend using the SMTP proxy with these clients. =============== ...I saw from that item 4.3 that there are locations at... spambayes_spam@localhost ...and at... spambayes_ham@localhost ...are these locations on this my computer or are they located at my ISP's server? This is a standalone machine with a cable-modem broadband connection to NTL... I've just used XP's Search facility to see if, spambayes_spam@localhost , exists as a file on this machine it didn't find such and I didn't expect that it would. Are these thousands of spam actually located on my ISP's server, that is on NTL's server? If so how do I go about accessing this that I might reduce the amount of spam to something near the same 'ham' figure of 1,725? Excuse the long explanation. Regards, Harry Sigerson.