From hernan at orgmf.com.ar Mon Nov 1 23:25:14 2004 From: hernan at orgmf.com.ar (=?iso-8859-1?Q?Hernan_Mart=EDnez_Foffani?=) Date: Mon Nov 1 23:25:25 2004 Subject: [spambayes-dev] i18n revisited Message-ID: Some issues: For the scripts to be translated, I think that it would be better if i18n.py be under the spambayes directory so it can be reached by either the addin and the scripts. Regarding sb_server, I could make it work with some messages translated. I didn't have time to see ui.html but a solution like dialogs.rc might work though. Later, -Hern?n. From tameyer at ihug.co.nz Tue Nov 2 04:34:51 2004 From: tameyer at ihug.co.nz (Tony Meyer) Date: Tue Nov 2 04:35:53 2004 Subject: [spambayes-dev] i18n revisited In-Reply-To: Message-ID: > For the scripts to be translated, I think that it > would be better if i18n.py be under the spambayes > directory so it can be reached by either the > addin and the scripts. Works for me. I'm happy to check in the stuff that's done so far so that it's easy for other people to test - it's all working nicely for me. Sound ok to you? > Regarding sb_server, I could make it work with > some messages translated. I didn't have time > to see ui.html but a solution like dialogs.rc > might work though. Richie might have ideas here, hopefully :) Attached is a possible patch for PyMeldLite.py and sb_server.py, which works for me. I don't know if this is the best way to do it, though. Also attached is a script that generates a dummy Python script from ui.html that can be given to pygettext.py to generate the .pot file. It could use some tidying up, still, as it has a lot of junk, but it's a start. I've also attached a sample .po file, which only translates the "SpamBayes Web Interface" header. i18n.py needs to have the extra .mo added. What do you think? =Tony.Meyer -------------- next part -------------- from spambayes import PyMeldLite from spambayes.resources import ui_html # Special class that saves the retrieved text to a global list. saved_text = [] class _SaveTextNode(PyMeldLite._TextNode): def toText(self): text = self._text saved_text.append(text) return text # Replace the default _TreeNode with our one. PyMeldLite._TextNode = _SaveTextNode # Create the page (this has all the ui.html data) page = PyMeldLite.Meld(ui_html.data) # 'Print' the page, filling our saved_text global. str(page) # Run through the saved text, creating a fake Python script printing # each line. fake_script = open("gettext_ui_html.py", "w") for line in saved_text: line = line.replace(' ', ' ') if line.strip(): fake_script.write('print _("""%s""")\n' % (line,)) fake_script.close() -------------- next part -------------- A non-text attachment was scrubbed... Name: PyMeldLite.py.diff Type: application/octet-stream Size: 769 bytes Desc: not available Url : http://mail.python.org/pipermail/spambayes-dev/attachments/20041102/7785d4c5/PyMeldLite.py-0001.obj -------------- next part -------------- A non-text attachment was scrubbed... Name: spambayes_resources_ui_html.po Type: application/octet-stream Size: 25837 bytes Desc: not available Url : http://mail.python.org/pipermail/spambayes-dev/attachments/20041102/7785d4c5/spambayes_resources_ui_html-0001.obj -------------- next part -------------- A non-text attachment was scrubbed... Name: sb_server.py.diff Type: application/octet-stream Size: 1709 bytes Desc: not available Url : http://mail.python.org/pipermail/spambayes-dev/attachments/20041102/7785d4c5/sb_server.py-0001.obj From hernan at orgmf.com.ar Tue Nov 2 11:39:28 2004 From: hernan at orgmf.com.ar (=?iso-8859-1?Q?Hern=E1n_Mart=EDnez_Foffani?=) Date: Tue Nov 2 11:38:44 2004 Subject: [spambayes-dev] i18n revisited In-Reply-To: Message-ID: >> For the scripts to be translated, I think that it >> would be better if i18n.py be under the spambayes >> directory so it can be reached by either the >> addin and the scripts. > > Works for me. I'm happy to check in the stuff that's done so far so > that it's easy for other people to test - it's all working nicely for > me. Sound ok to you? A couple of things first: The i18n.py that I uploaded lacks the locale_language() method (or something like that --I don't have the code at hand--. More important, though, is that if i18n.py ends under spambayes I would like to add a flag to allow mucking with sys.path (we only need this for the addin.) Besides, I want to change the test to the doctest format. OTOH, you may check it in if you like. I'll resync from HEAD and post in SF any future patch I make. >> Regarding sb_server, I could make it work with >> some messages translated. I didn't have time >> to see ui.html but a solution like dialogs.rc >> might work though. > > Richie might have ideas here, hopefully :) > > Attached is a possible patch for PyMeldLite.py and sb_server.py, > which works for me. I don't know if this is the best way to do it, > though. Also attached is a script that generates a dummy Python > script from ui.html that can be given to pygettext.py to generate the > .pot file. It could use some tidying up, still, as it has a lot of > junk, but it's a start. I've also attached a sample .po file, which > only translates the "SpamBayes Web Interface" header. i18n.py needs > to have the extra .mo added. > > What do you think? Having an extra .mo is not a problem as long as their content is very specific. I thought of that for the dialogs.rc case. Just *don't* use "_" in the places where we want to use the alternative (in your proposal, only in PyMeldLite.py:_GetTextNode.toText()) but the expanded gettext call. Anyway, won't it be hard to maintain? There's lots of strings there that shouldn't be translated, right?. I guess that's what you called "junk" but, would it be possible to separate translatable/ non-translatable strings in ui.html? Of course, I could check one by one and delete all of those from the .po but any future change on the original would require a complete recheck. I was thinking something like opening a given i18n.ui.html in the corresponding language directory and falling back to the standard ui.html if not found. But I don't know if it's too much power to the translator. Regarding the sb_server.py patch, I had also added the initialization of the LanguageManager in the init() function but it failed later when I tried to translate Options.py. The addin delays the import after the LanguageManager but sb_server.py does not. We can either: 1) Call LanguageManager at the very beginning of each main module. We have to be careful with the program flow and which imports whom to avoid NameError exceptions. 2) Rebind "_" at i18n.py *import* time with, say: def _(x): return x or, better yet, installing the NullTranslation (same results) But I don't know if you use "_" at the command prompt too much. ;-) 3 and up) any other option that I don't know. -H. From kenny.pitt at gmail.com Tue Nov 2 19:50:02 2004 From: kenny.pitt at gmail.com (Kenny Pitt) Date: Tue Nov 2 19:50:08 2004 Subject: [spambayes-dev] Re: [Spambayes-checkins] spambayes/spambayes Stats.py, 1.6, 1.7 In-Reply-To: References: Message-ID: <2a052b9904110210501e3ea2c8@mail.gmail.com> Tony Meyer wrote: > The Outlook stats could be changed to look > more like this (or the damn code could be centralised), too, maybe, except that there > isn't much room in the dialog for a lot of text. Maybe Kenny has a patch for that? > (A spambayes-dev message indicated that he might). I had some experimental code that added a separate Statistics tab to the SpamBayes Manager dialog to give this more room. I believe I also have at least the majority of the new statistics implemented in that as well. I'll see if I can find a little time to update it for the latest version of the 1.1 code and get it checked in. -- Kenny Pitt From richie at entrian.com Tue Nov 2 22:45:42 2004 From: richie at entrian.com (Richie Hindle) Date: Tue Nov 2 22:45:48 2004 Subject: [spambayes-dev] i18n revisited In-Reply-To: References: Message-ID: <0mvfo0hv41bbkqhkn0h71gmp2feetduvjc@4ax.com> [Tony] > Richie might have ideas here, hopefully :) Yikes! 8-) I know little about all this, never having had anything to do with gettext. >From what I can see of your attached .po file, Tony, that route looks like a nightmare for the translator. Hern?n said "I was thinking something like opening a given i18n.ui.html in the corresponding language directory" and that sounds much more like it. I can't imagine that anyone willing to translate SpamBayes would be too scared or underqualified to edit the raw HTML. And even if they were, things should continue to work even they use an WYSIWYG HTML editor to do the translation - that's (in untested theory) one of the benefits of PyMeldLite. Another advantage is that they would be doing the translation in context, rather than on a disjoint bunch of strings. -- Richie Hindle richie@entrian.com From tameyer at ihug.co.nz Tue Nov 2 22:59:15 2004 From: tameyer at ihug.co.nz (Tony Meyer) Date: Tue Nov 2 22:59:21 2004 Subject: [spambayes-dev] i18n revisited In-Reply-To: Message-ID: [Hern?n Mart?nez Foffani] > The i18n.py that I uploaded lacks the locale_language() method (or > something like that --I don't have the code at hand--. Sorry. I revised the code a little, and should have included the revised i18n.py. The checked-in version (see below) ought to work, anyway. > More important, though, is that if i18n.py ends under spambayes > I would like to add a flag to allow mucking with sys.path (we > only need this for the addin.) > Besides, I want to change the test to the doctest format. > > OTOH, you may check it in if you like. > I'll resync from HEAD and post in SF any future patch I make. I've gone with this. I've checked it in as-is for the moment, and changes can be made against that - it'll make it easier to progress, I think, and those are more modifications (especially the doctest stuff) than additions. > Having an extra .mo is not a problem as long as their content > is very specific. I thought of that for the dialogs.rc case. Is there a way to create a .pot file from several input files? Conceptually, it would be better (I think) to have a .mo for Outlook and one for the other scripts, but I don't know how to combine the ones for (eg) sb_server.py and the ui.html stuff. > Just *don't* use "_" in the places where we want to use the > alternative (in your proposal, only in > PyMeldLite.py:_GetTextNode.toText()) but the expanded gettext > call. gettext.gettext()? (Sorry - I'm not sure what you mean - I'm very much new to this!) > Anyway, won't it be hard to maintain? There's lots of strings > there that shouldn't be translated, right?. Some, yes. Mostly the example text from ui.html that is there to explain what each piece is doing, and is dynamically removed when pages are built at runtime. > I guess that's what > you called "junk" but, would it be possible to separate translatable/ > non-translatable strings in ui.html? Possibly (that was what I thought of first, but it didn't seem like a very nice solution). Translatable strings could be enclosed in "", for example, but that adds a lot to ui.html. > I was thinking something like opening a given i18n.ui.html in > the corresponding language directory and falling back to the > standard ui.html if not found. But I don't know if it's too much > power to the translator. I think this would be harder, because you'd have to ensure that the i18n.ui.html and ui.html matched very closely - if an element from ui.html was added/removed, that would have to also be done to all the i18n.ui.html ones, or the interface would crash with an AttributeError. Working with just the strings would be much easier to maintain, IMO. [but then Richie added] > From what I can see of your attached .po file, Tony, that route looks > like a nightmare for the translator. Well, I did indicate that I'd tidy it up somewhat . > Hern?n said "I was thinking something like opening a given i18n.ui.html > in the corresponding language directory" and that sounds much more like > it. [...] I'm happy to be outvoted here. I guess we can build something to keep the ui.html's in sync if necessary... > Another advantage is that they would be doing > the translation in context, rather than on a disjoint bunch of strings. That would be an advantage, yes. I could add some additional comments indicating the blocks of text that do not need to be translated (the Introduction, for example). I'll work up a new patch with this system and post it here later today... [Hern?n Mart?nez Foffani] > Regarding the sb_server.py patch, I had also added the initialization > of the LanguageManager in the init() function but it failed later > when I tried to translate Options.py. The addin delays the import > after the LanguageManager but sb_server.py does not. We can either: > > 1) Call LanguageManager at the very beginning of each main module. > We have to be careful with the program flow and which imports > whom to avoid NameError exceptions. > > 2) Rebind "_" at i18n.py *import* time with, say: > def _(x): return x > or, better yet, installing the NullTranslation (same results) > But I don't know if you use "_" at the command prompt too much. #2 sounds good to me, but, as I said, I'm new to this. I never use '_' - but even if I did, that would only change behaviour if I (in an interactive session) did "from spambayes.i18n import _", right? =Tony.Meyer From kennypitt at hotmail.com Tue Nov 2 23:29:28 2004 From: kennypitt at hotmail.com (Kenny Pitt) Date: Tue Nov 2 23:30:06 2004 Subject: [spambayes-dev] i18n revisited In-Reply-To: <0mvfo0hv41bbkqhkn0h71gmp2feetduvjc@4ax.com> Message-ID: Richie Hindle wrote: > [Tony] >> Richie might have ideas here, hopefully :) > > Yikes! 8-) I know little about all this, never having had anything > to do with gettext. > >> From what I can see of your attached .po file, Tony, that route looks > like a nightmare for the translator. > > Hern?n said "I was thinking something like opening a given > i18n.ui.html in the corresponding language directory" and that sounds > much more like it. My gut instinct tells me that translating the .html file is the way to go. This is where all the really beefy strings reside, and gettext() is more oriented towards messages that are a sentence long or shorter. It seems kind of scary to think of gettext() doing a string match on a 2KB block of text to find the contents of a static help page. -- Kenny Pitt From richie at entrian.com Tue Nov 2 23:35:43 2004 From: richie at entrian.com (Richie Hindle) Date: Tue Nov 2 23:35:48 2004 Subject: [spambayes-dev] i18n revisited In-Reply-To: References: Message-ID: [Tony] > you'd have to ensure that the > i18n.ui.html and ui.html matched very closely - if an element from ui.html > was added/removed, that would have to also be done to all the i18n.ui.html > ones, or the interface would crash with an AttributeError. True, though ui.html doesn't change very often, so keeping them in sync wouldn't be a huge overhead. You could imagine some kind of fallback system whereby if an element wasn't present in the i18n file then it was fetched from the English file, but I can't see an easy way of doing that off the top of my head. -- Richie Hindle richie@entrian.com From tameyer at ihug.co.nz Wed Nov 3 00:09:04 2004 From: tameyer at ihug.co.nz (Tony Meyer) Date: Wed Nov 3 00:21:13 2004 Subject: [spambayes-dev] i18n revisited In-Reply-To: Message-ID: [separate versions of ui.html] > I'll work up a new patch with this system and post it here > later today... More convincing from Richie and from Kenny has followed, so this looks like the plan :) Attached are: * i18n.ui.html [the translation is all automatic, so excuse the poor language] This goes in the languages/es folder (I suppose it could have another directory inside there, but I can't think of anything else that would go with it). * i18n_ui_html.py. This is for those without resourcepackage only - it gets automatically created if you do have it. A blank __init__.py is needed in the languages directory, and (for those with resourcepackage) a copy of the spambayes/resources/__init__.py is needed in languages/es, too. * sb_server.py.diff. A patch for sb_server to load the language manager. This is unchanged from the last one. * ProxyUI.py.diff. A patch for ProxyUI.py * UserInterface.py.diff. A patch for UserInterface.py. * A patch for i18n.py. (Against the version in CVS at the moment). This doesn't fall back from (eg) es_AR to es at the moment, but that wouldn't be hard to add in. These work for me (i.e. with "es" as my language I get the interface mostly translated, and with "foo" as my language, I get the normal version). =Tony.Meyer -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/spambayes-dev/attachments/20041103/1be32a4f/i18n.ui-0001.html -------------- next part -------------- # -*- coding: ISO-8859-1 -*- """Resource i18n_ui_html (from file i18n.ui.html)""" # written by resourcepackage: (1, 0, 0) source = 'i18n.ui.html' package = 'languages.es' import zlib data = zlib.decompress("x??}]s?6??{??\0370??)???[???????e?8s??k+??J?Th\022?M?M0\004)????pF\036?\ ?Oy??=\037\000\011~???3s?????I?????? \003??w??9?????????????O_~u.???????\ ????3~?`<=\024\027 ?LR&:??d???`??\027??r??o%c?]&e?D\022?\005?)?}???X??DU&i?\ ??u!?)?6???r????\032b??*?X?e~?~?????\\F+\025?H\003??<\0132}\020???`??7yR(???\ XU#\001??Uf??O??????\027yu?\015\014\015{p\0160\012?\016\014?\022???\007 ??i\022?\022\036?+S??af?\015?4?\ ?F?,\026\000?D?izO?\"??H?????X??J?I$\037??,?Iv\"???????s`?\016???B\027kh?\002??\027\ ?$M???>??=y\012|?\013?U???40\012K?\004??Yn???\024????????\001Y(?u\003?\ ?\013?\007=2 s\007?\007?2b$ v??1?Z&??>!,??\\?)?l??\031?k]?'?\004FL?U??s}\003\026?\\1\000x\ ?e\032??G\026&t????8M??*i?6M\002Qnr\005n&?y\000+#?T?\0328\007z?,8?\006\023?\002\032?\022 ?Flt??j\ #? _?B?????p??\036?\007?k??}{?h???*9\0076??C\026???\036???????m?Y@\026?|\032????\004\ ????\005T?\036a;?C%%???U?\027X?\025?DP??>\030^\031???\010)?\021\000j\020\030?B\005???=\004?+??\000YC`?\ fv????????????6A???8?$??I??\030\0234?18_?l????!I\016 .I\023?????k??&??:\010\ ?\024)?*\001??\026??\011B?jH d?5\024?\012??BB\034 ?\032P??)Q????-QO?\033\014+$ ??\001?-?oT=C\ [\024z-LT?\022\024o?h?\021x??}[?z$?@,\005\034`?)?P??S??\032@?8dv\016D?|??\021\030???R?\0345J\020\ ?\001???a?\032?E???pP??\024Asg???\013\035&\035?9akmT?\015r\002m/???P?l????JA[b\0236G=\007?\ a\026???\007?Nl?\024?&??#??RP??;???P?\0319r?S??\036N??h_\007N?84?n?\037??\"Ri\012=#??\ ?Fg???\024?f\007\016?\006???q??\007?>\013??\031a\010F\023Q+cXE\000?\0071L@\020?a?\020W\004??\020? \035??0?\0341\ \001??????\003??\001\013??6?,+?????`\032?\031CO?>;'??E\021O??%\032(?8?-??\036d??H\012???\012^\ iS\016???y| =\015\034X?\006?c+?K???u\031?>????=??P?\006\032???\006}ZT\001?J????<\005?H??u?\ ?\015$?\020L\025?F?>?\023S;%?M\032?d????z??Rf?e??y1???wJ<2\000???t???:\004?7?????\ n\021?\007\\Y??l?Xv\023a???\021?.\012}?#\036?\"??OK???\032??>?]?\021\020??? ?I????????\036?\012\ ?\000k\000\0053?d\012?*??$?(????\002S\\qb@#???I?,??D???_?? ?\035f???\002?a\0121??\020???\ \020?r2??jY(t????m\002k\032W_T(e??U?!??\003?*#S\007?\021\025??\010E??FFF\015,xa?_?m?A??\ ?\"N?o\031?2? *?_??\036D??\007?6??`????N~\015?:ZB?&?\022???????\003??\037?????7s\012J\ ?U?M\003k?(?_m??B8\014??\006?v?J?\020?`\034k?/???\000???R[b+???J?\0003?\000i^@?-?\007??\ t?>\003?q\000?\015]G?4?f???F??:?\002\034\017?\022$?'?G\010???!??98\032U)??+\033X\006?\027??6\034\012\016?\ ?\010\036?:C?\013qF\010\032u????E?L???j?!?W\006??\022 ?\000v??V??R#\001??O&\023??\"R\013],?8S?\ \004??????;F\030?%?????L??\031?O\006???\0360D?$???|K????0?2BA0 \001?s\031??Q[?\034pi\ \"?Ms?H?h??????\0207AHU0]????sh??\012?\034?@\002\032?\006\024?y?\024?\021?_??G\012!(??X?#\030s\ $\017?\034#2$A\000??w?@?F\033??q???BL???BnD\004?\005$%?0e.Qo??J?$l3$1Aa??H\"b?\ ?*??E????B#?`\026?;??1?`?^X*?I?p?\024????E\011\010Eah.\032?\002i??O'5??w?M?4??\ ??F\033?@???[??8??\005'?Hl\000???Z\000????\0002???\014\006???H?U\\?j????K??0=W??M?\ ???)K??\000?\004~9??X?????,pE*E??c????[C4\005\001\004Q?\011x\030?.??x!???O?>\026?'.4/*t??\000,W??7?@????zF?Rv??!X?Seg\015????\031?EU?\ ??9X!\021G\005`\032??\030 \004A??`???@h)??????????e,7f\037?'?b?\\\012\0146?\004f(??$\035??\ ?:? ??? ?.??x?,???t?+?v\0161?G\034\002?\031?5?L\026-@\021gHQ???\003Hbk?\ B?)?C%\020?R????n?GmN[\"\034?tk?.??\001?Mh?b??\002?e???4-9\000cLE\005??Y\026?&@\034??\ *q-\030#o?O?zv??~:?O8K%v?}#?J?\037j?q?^??/??U9VS??4????5\015nM?$\022??E?\ ??u??]?F?a\002>O?\022??0?7dj?[\007\037]?????????#1???@m\\+\001?G??4?t ?HC???\ %??x??P??gN'!?|Q??d??V?\032??\000\"\016??1?;\021\001 \004\013;\033?????u?>\011f>?'?(??p\ U?F??????\016?h?%??\031?2?f?\005a\007&(8??=??\016?6?\033\021???????SJ?????P?C\011W?*\ \002? `??\020\000?9?{\020&?[???v?V??x?4????????4\034?{hqH?y?????Y)??9\006?!?\014#\ ?!?????v\030\015\026*&pG?<^????8\007????tR?>\016ZG??q\006??\014t?\034??\021?\006&>?\033?\000\002\003??\ ?k?W??]????R??j$\005\011???-????u?%????\002?\003?_%)???;?i/??c??!\027??o???\ ?-\005?GV\030??}6?\002????Z???L??8v??\033?????\005????\022??=f??;??k?????\004?\000\ ?$2;8??O??|?Tn\004???J??([h?p?c???(k((??\013R?g?p6G\014A????43??\006\035n??\ 6?d\027????`S\026;_\014-?B\026??p?\0321?4\022U\017?@?\021??\037\024\035?\ z\006&??]?\024???6??\027B:?\027\027t?RL??*\004??H?U??\032\035!?j??_??`?(?\033)?X'??y~#?\ ??J\014???\033?\013?\036?m2??l?j??J???D)~????\003!??-0??,\022??w? ?mHU\012q??,?fq\ |??\015??;U???\014$??\004y?\010\034\004/?(?gK?8???<\033r#q????\007\025??j????}?Px?.?_B,\ \013?\026I??`?7?mQ\ L??`??\021&F??2=??(!??? J??ZV&*????\013)P??\032? ?\000_ct??\000\011??X??S???>?\ ?\032???_??W=?EQv\026????0??\003?\010??+Ul,GA\027$?@mp?P7??\011??L\033\030\013\"?\005\007???k?\ ^`?\004}Rk????\035?1yD\010Iq?????J?Z%??'?\"^?v]????(?< 5??\\?h??\011or?k*?????g???ly?8?~?\036?\015??K\010X\013?q?x/>C??m??X\"\ /?Z~\002?L?U???G????%?????][???_R?@q\"????5&\032? ?&\025??????K????6??\ ?G????\"\037g\0264>?5 ?a??6\006s\026n??bd>?/?TOBb?\005A\ ?a???\035>??\015?????g?cJm??Z??M?\0353???`?\031=????\036_?`??yQ??BK??????X7\ ? ?? ?j#M\"VXQ)c?\026il2V?{???Y)\ ?? ?'Xn?I\026X>\030a??\034???????d??lY???L\034?7]ns???Y?????\026\ ?8?nv\000x?o???n?s???\004oh\000 ?????1???^? '???-\037Hh??Qtv&????\005??\017i\027\ ???aqqc???g5w??~??U?w=????{?|;?\010??{?p0>??\001\032?????j??_??\017?4g?H\ XvT???y{\021???|??????/??US??u\036?m5w????H?z????m:\036?Xk?whH????E`\\??????z?S????<\035U\ \"?2{??n\035???\035_???\026e\007H;,????[b0?-??v??6l????\\/?????'\001?\005?????i/\003\022?.}x?e??`>??^)?????(w??\016\ \017?c\013???N???]\011o??c?B?????????z??????S?\027?\033R?\033n?h\034?\017???l`#e???\012\ ???~3???w\010????????oei\013?N????\022\036\036??\011n????3?O?n??+??t|_1t?S=s?\ ??]?8???ut7??(\012????I??\036?S??Xp\007???R?z???n?\026???dx???x?? I\001?\013?I\ h?1tKn?\017?????d??\000m????\007?, ?w??\037?\005|????itg 1S\036??O????\012??99\020\\\ ???b???)??b??m??g??\023\011?\011)t???W?\036??J?\012????T?9n??-^???\030U??fX{\ ?\020???-\0251?e???(cb\017qLr?????????O?P??'O_>??\014??\0318\033s???\005\0362?????s?\ ?J|@K5?+*??Dl??\022?M??]?\027????}|?\017T\035=?\001\023?o?)?e??L??r-?????`?H&\ ?a??f?j{??V\000\003w???L\037?x?????\017.7?????\017?\013\037?\014????c?j?\011??b?\\*?\020?S;\ \016?\012?x\022?????.\020?&????CjW\\???(????.??J\010????\011?\025???*] [G]?????Cb?\ ?h{?? ^?\014????Y?????\003???Q}\014?????|??\032?nr?j?g??O?????f??\014?]?\024t}\ ?\005k\026gp???-\032???\030???K?^??5??K?6???3s?\025E????n[\022?????]\025\023??}/???\ \0109?5\020/\035?bw?\016????2\023W\030(?S?7??K\0354y?\032??^p?P??f?\005?\007?Ge??j!9\027?u??@\ ???\033q?f???\015&\033??^\003Tl??a??C?q\015??6e?vL/?6?;\010\030???}???t???\014,?6!?_?_k\025X.:??_8??\026??\027\ ?\013??g?") ### end -------------- next part -------------- A non-text attachment was scrubbed... Name: sb_server.py.diff Type: application/octet-stream Size: 0 bytes Desc: not available Url : http://mail.python.org/pipermail/spambayes-dev/attachments/20041103/1be32a4f/sb_server.py-0001.obj -------------- next part -------------- A non-text attachment was scrubbed... Name: ProxyUI.py.diff Type: application/octet-stream Size: 1146 bytes Desc: not available Url : http://mail.python.org/pipermail/spambayes-dev/attachments/20041103/1be32a4f/ProxyUI.py-0001.obj -------------- next part -------------- A non-text attachment was scrubbed... Name: UserInterface.py.diff Type: application/octet-stream Size: 4102 bytes Desc: not available Url : http://mail.python.org/pipermail/spambayes-dev/attachments/20041103/1be32a4f/UserInterface.py-0001.obj -------------- next part -------------- A non-text attachment was scrubbed... Name: i18n.py.diff Type: application/octet-stream Size: 1520 bytes Desc: not available Url : http://mail.python.org/pipermail/spambayes-dev/attachments/20041103/1be32a4f/i18n.py-0001.obj From hernan at orgmf.com.ar Wed Nov 3 01:09:47 2004 From: hernan at orgmf.com.ar (=?iso-8859-1?Q?Hernan_Mart=EDnez_Foffani?=) Date: Wed Nov 3 01:09:40 2004 Subject: [spambayes-dev] i18n revisited In-Reply-To: Message-ID: >> OTOH, you may check it in if you like. >> I'll resync from HEAD and post in SF any future patch I make. > > I've gone with this. I've checked it in as-is for the moment, and > changes can be made against that - it'll make it easier to progress, > I think, and those are more modifications (especially the doctest > stuff) than additions. Fine! >> Having an extra .mo is not a problem as long as their content >> is very specific. I thought of that for the dialogs.rc case. > > Is there a way to create a .pot file from several input files? Just append the text pairs of msgid/msgstr. > Conceptually, it would be better (I think) to have a .mo for Outlook > and one for the other scripts, but I don't know how to combine the > ones for (eg) sb_server.py and the ui.html stuff. In my opinion gettext is more suitable to partition by applications domains than logic ones. >> Just *don't* use "_" in the places where we want to use the >> alternative (in your proposal, only in >> PyMeldLite.py:_GetTextNode.toText()) but the expanded gettext >> call. > > gettext.gettext()? (Sorry - I'm not sure what you mean - I'm very > much new to this!) Yes. Actually, there is a way to keep using the "_" but then i18n.py would need some rework. From python docs: import gettext t = gettext.translation('logic_partition', ...) _ = t.gettext A translation instance has an application domain and a fallback language code list. AFAIK, you can't modify the application domain without reinstantiating the object. We'll need a set of objects with the same lang list but different application domain name. The application domain is the string that corresponds to the name of .mo file. > [...] > I'm happy to be outvoted here. I guess we can build something to > keep the ui.html's in sync if necessary... > ... > I'll work up a new patch with this system and post it here later > today... uh... I'm falling behind with the current thread. Good! ;-) I'll read your new patches and see if I can manage to translate the html without screwing up everything. > [Hern?n Mart?nez Foffani] >> Regarding the sb_server.py patch, I had also added the initialization >> of the LanguageManager in the init() function but it failed later >> when I tried to translate Options.py. The addin delays the import >> after the LanguageManager but sb_server.py does not. We can either: >> >> 1) Call LanguageManager at the very beginning of each main module. >> We have to be careful with the program flow and which imports >> whom to avoid NameError exceptions. >> >> 2) Rebind "_" at i18n.py *import* time with, say: >> def _(x): return x >> or, better yet, installing the NullTranslation (same results) >> But I don't know if you use "_" at the command prompt too much. > > #2 sounds good to me, but, as I said, I'm new to this. I never use > '_' - but even if I did, that would only change behaviour if I (in an > interactive session) did "from spambayes.i18n import _", right? Yep. -H. From ta-meyer at ihug.co.nz Wed Nov 3 01:57:15 2004 From: ta-meyer at ihug.co.nz (Tony Meyer) Date: Wed Nov 3 01:57:30 2004 Subject: [spambayes-dev] [ 1059170 ] SpamBayes-1.0 msvcrt.dll install crash Message-ID: Hi everyone, If there's someone with WinXP SP2 that has a spare moment, could you verify this bug? Alternatively, is there anyone here who knows anything about changes to msvcrt.dll with SP2? >From what I can tell, outlook_addin_register.exe only uses: 00000249 exit 0000026A getenv 000002B2 sprintf 000002C1 strncpy 000002C3 strrchr 0000008F _acmdln 0000009D _adjust_fdiv 000000B7 _controlfp 000000CA _except_handler3 000000D3 _exit 0000010F _initterm 00000113 _iob 00000194 _putenv 000001AE _snprintf 00000048 _XcptFilter 00000058 __getmainargs 0000006A __p__commode 0000006F __p__fmode 00000062 __p___argc 00000063 __p___argv 00000083 __setusermatherr 00000081 __set_app_type >From msvcrt.dll. Wouldn't a chance to these break a lot of programs? =Tony.Meyer From kennypitt at hotmail.com Wed Nov 3 18:51:53 2004 From: kennypitt at hotmail.com (Kenny Pitt) Date: Wed Nov 3 18:52:04 2004 Subject: [spambayes-dev] [ 1059170 ] SpamBayes-1.0 msvcrt.dll install crash In-Reply-To: Message-ID: Tony Meyer wrote: > If there's someone with WinXP SP2 that has a spare moment, could you > verify this bug? I built a copy of the binary and registered it with outlook_addin_register on my XP SP2 system. Registration ran without error, and I was able to load Outlook and access SpamBayes with no problems. I also confirmed in the registry that outlook_addin_register had, in fact, correctly registered outlook_addin.dll as the handler for the SpamBayes addin COM object. > Alternatively, is there anyone here who knows anything about changes > to msvcrt.dll with SP2? 7.0.2600.2180 is the correct version number for the XP SP2 version of msvcrt.dll, so we know it wasn't overwritten by some rogue installer that didn't follow the Microsoft rules. I can't speak to specific changes made for SP2, but this is now a system DLL and Microsoft has guaranteed that the interface will never change. Given the extensiveness of their compatibility testing and the number of users outside of Microsoft that beta tested SP2 before it was released, I suspect they would have discovered it by now if there was a compatibility problem. >> From what I can tell, outlook_addin_register.exe only uses: > [snip] >> From msvcrt.dll. Wouldn't a chance to these break a lot of programs? Yes it would, which is why I don't believe that this is the real cause of the problem. Given the thousands of applications that reference that DLL, I can't imagine that Microsoft could have missed a compatibility issue severe enough to cause a GPF in an app that uses only the most basic functions from MSVCRT. -- Kenny Pitt From tameyer at ihug.co.nz Thu Nov 4 04:19:15 2004 From: tameyer at ihug.co.nz (Tony Meyer) Date: Thu Nov 4 04:19:54 2004 Subject: [spambayes-dev] [ 1059170 ] SpamBayes-1.0 msvcrt.dll install crash In-Reply-To: Message-ID: > I built a copy of the binary and registered it with > outlook_addin_register on my XP SP2 system. Registration ran > without error, and I was able to load Outlook and access SpamBayes > with no problems. Thanks :) Somewhat ironically, my SP2 CD arrived in the post yesterday, so I'm able to check this out now, as well. (The distributed 1.0 binary also appears to work fine). > Yes it would, which is why I don't believe that this is the > real cause of the problem. Given the thousands of applications that > reference that DLL, I can't imagine that Microsoft could have missed a > compatibility issue severe enough to cause a GPF in an app that > uses only the most basic functions from MSVCRT. Agreed. I've made similar comments in the tracker - my best guess is that the OP tried it once, thought "a ha - I know how to fix this", replaced the .dll and it worked. However, that first time could have been some weird fluke (with this rather unstable machine I get GPF's fairly often, in very stable apps in unreproducible ways), and the 'fix' wasn't really necessary. Thanks for checking it out, though :) =Tony.Meyer From falcon_050 at hotmail.com Thu Nov 4 09:02:49 2004 From: falcon_050 at hotmail.com (falcon_050) Date: Thu Nov 4 09:03:05 2004 Subject: [spambayes-dev] wish: keep spam-mail from downloading to OE6 Message-ID: Using SB for half a year I am generally very content. Yet I have (another) wish for using SB with OE6: Is it possible to not send mail marked "spam" to OE but instead just keep it in the SB's cache. While training I still can view these messages if I want to. The reason for my wish is that OE6 displays a "new mail icon" in the tray for spam it receives from SB even if it is (by rule) marked as read and put into trash. This new mail icon does not disappear when OE is opened. You have to restart it or really open one of the spam mail. This is very frustrating: you think you got new mail. The only thing you got is spam in the trashcan. I tried some other options in OE but they don't seem to work properly: (1) "Do not download from server". (2) "Remove from server" Regards and another compliment for your great product, Regards Valk Beekman, Amsterdam NL PS I somehow can't get the auto-train option to work -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/spambayes-dev/attachments/20041104/bf51e54d/attachment.html From mmolteni at cisco.com Thu Nov 4 09:23:15 2004 From: mmolteni at cisco.com (Marco Molteni) Date: Thu Nov 4 09:51:49 2004 Subject: [spambayes-dev] thanks for spambayes Message-ID: <20041104092315.46130122@barbapapa.cisco.com> I just wanted to tank you all for the wonderful spambayes. It is saving me time day after day. I love it. Thanks, and keep up the good work. Marco From tameyer at ihug.co.nz Thu Nov 4 23:41:47 2004 From: tameyer at ihug.co.nz (Tony Meyer) Date: Thu Nov 4 23:42:27 2004 Subject: [spambayes-dev] i18n revisited In-Reply-To: Message-ID: > In my opinion gettext is more suitable to partition by > applications domains than logic ones. So we'll probably have (at first) one for all the Outlook scripts and one for sb_server? > I'll read your new patches and see if I can manage > to translate the html without screwing up everything. :) =Tony.Meyer From hernan at orgmf.com.ar Fri Nov 5 11:55:59 2004 From: hernan at orgmf.com.ar (=?iso-8859-1?Q?Hern=E1n_Mart=EDnez_Foffani?=) Date: Fri Nov 5 11:55:18 2004 Subject: [spambayes-dev] i18n revisited In-Reply-To: Message-ID: >> In my opinion gettext is more suitable to partition by >> applications domains than logic ones. > > So we'll probably have (at first) one for all the Outlook > scripts and one for sb_server? That's an option. But where are we going to put the common strings (the ones in spambayes directory)? A big and unique .po file we know will work. But it would be hard to mantain for translators without the right tools. Alternatly we can generate several different .po files and merge them later to a unique .mo file. I rather not demand translators to install the complete GNU gettext package but I don't know if we could avoid that (it has nice tools to compare .po files, etc.) Anyway I'm not urged on this at the moment. For coders (our matter right now) we only need to know which domain to use. In the current version of i18n.py the domain is set only there so in case we find that that would lend us mess we may change it later. -H. From ta-meyer at ihug.co.nz Tue Nov 9 01:51:38 2004 From: ta-meyer at ihug.co.nz (Tony Meyer) Date: Tue Nov 9 01:52:19 2004 Subject: [spambayes-dev] 1.0.1 Release Message-ID: Hi everyone, I'm thinking about putting out a 1.0.1 release at some point next week. Are there any objections to this? It will be bugfix only from the 1_0_release CVS branch. I've merged in some of the bugfixes that have been committed since 1.0 and will do the rest over the next week or so, and test the merges more extensively. There have been quite a few bugfixes since 1.0, so it seems worth getting these out. A 1.1a1 release is probably still a while away, though, since it would be good to get several more things (i18n, Outlook buttons, storage options, training options) done (or at least mostly done) before that. =Tony.Meyer From ta-meyer at ihug.co.nz Tue Nov 9 06:49:58 2004 From: ta-meyer at ihug.co.nz (Tony Meyer) Date: Tue Nov 9 06:50:33 2004 Subject: [spambayes-dev] Outlook dialogs not calling onClose() Message-ID: As far as I can tell, the OnClose() method of dialogs do not get called when the "Close" button of the SpamBayes Manager dialog is clicked. Is there any chance that anyone knows why this is? It's causing troubles with failing to verify option values, but I can't figure out why it's not being called. All the other handlers (Notify, Destroy, Command, etc) get called as they should, but not Close. The button is IDOK, which I presume is correct. Any hints would be great :) =Tony.Meyer From kenny.pitt at gmail.com Tue Nov 9 15:45:20 2004 From: kenny.pitt at gmail.com (Kenny Pitt) Date: Tue Nov 9 15:45:26 2004 Subject: [spambayes-dev] 1.0.1 Release In-Reply-To: References: Message-ID: <2a052b99041109064569cabbed@mail.gmail.com> On Tony Meyer wrote: > I'm thinking about putting out a 1.0.1 release at some point next week. Are > there any objections to this? > > It will be bugfix only from the 1_0_release CVS branch. I've merged in some > of the bugfixes that have been committed since 1.0 and will do the rest over > the next week or so, and test the merges more extensively. > > There have been quite a few bugfixes since 1.0, so it seems worth getting > these out. A 1.1a1 release is probably still a while away, though, since it > would be good to get several more things (i18n, Outlook buttons, storage > options, training options) done (or at least mostly done) before that. +1, and thanks for all your work getting these fixes merged in. -- Kenny Pitt From richie at entrian.com Tue Nov 9 21:02:32 2004 From: richie at entrian.com (Richie Hindle) Date: Tue Nov 9 21:02:36 2004 Subject: [spambayes-dev] 1.0.1 Release In-Reply-To: <2a052b99041109064569cabbed@mail.gmail.com> References: <2a052b99041109064569cabbed@mail.gmail.com> Message-ID: <0i82p059gvo0cdlvunssnma59v4p4t7buc@4ax.com> [Kenny] > +1, and thanks for all your work getting these fixes merged in. Likewise on both counts. -- Richie Hindle richie@entrian.com From fusion at clanspum.net Wed Nov 10 05:38:33 2004 From: fusion at clanspum.net (Daniel Lyons) Date: Wed Nov 10 05:35:52 2004 Subject: [spambayes-dev] Question about tokenize_word and Tokenizer.tokenize_body Message-ID: <41919B49.70807@clanspum.net> Hi, At the very end of spambayes/tokenizer.py (version 1.33) in Tokenizer.tokenize_body, are lines 1593 to 1601: for w in text.split(): n = len(w) # Make sure this range matches in tokenize_word(). if 3 <= n <= maxword: yield w elif n >= 3: for t in tokenize_word(w): yield t The lines inside the for loop there mirror those of the function tokenize_word, lines 690-695: def tokenize_word(word, _len=len, maxword=options["Tokenizer", "skip_max_word_size"]): n = _len(word) # Make sure this range matches in tokenize(). if 3 <= n <= maxword: yield word This leads me to believe that tokens found in the body text are being generated twice by the tokenizer. This of course isn't causing problems in the classifier because it uses the unique tokenlist, unlike Graham's mechanism, using a set object. But, these functions both contain a comment referring to each other about the range being the same. I'm unclear on the benefit of duplicating the code since ultimately "all roads lead to Rome," that is, via tokenize_word. What's the real purpose to this duplicated effort? Thanks in advance, -- Daniel http://www.storytotell.org -- Tell It! From tameyer at ihug.co.nz Wed Nov 10 23:00:13 2004 From: tameyer at ihug.co.nz (Tony Meyer) Date: Wed Nov 10 23:00:53 2004 Subject: [spambayes-dev] Question about tokenize_word andTokenizer.tokenize_body In-Reply-To: Message-ID: > At the very end of spambayes/tokenizer.py (version 1.33) in > Tokenizer.tokenize_body, are lines 1593 to 1601: > > for w in text.split(): > n = len(w) > # Make sure this range matches in tokenize_word(). > if 3 <= n <= maxword: > yield w > > elif n >= 3: > for t in tokenize_word(w): > yield t > > The lines inside the for loop there mirror those of the function > tokenize_word, lines 690-695: > > def tokenize_word(word, _len=len, maxword=options["Tokenizer", > > "skip_max_word_size"]): > n = _len(word) > # Make sure this range matches in tokenize(). > if 3 <= n <= maxword: > yield word > > This leads me to believe that tokens found in the body text are being > generated twice by the tokenizer. Nope. tokenize_word() either spits out the same word/token that it was given (if the length is between 3 and maxword inclusive), or it spits out appropriate 'skip' tokens if the word is longer than maxword. In the code quoted above, tokenize_word() only gets called if the word is longer than maxword, so it will only be producing the various skip tokens. It is in fact reducing code duplication by not including the skip stuff there. > What's the real purpose to this duplicated effort? I presume you mean the bit about "if 3 <= n <= maxword". True, if this wasn't there and the code read instead: for w in text.split(): for t in tokenize_word(w): yield t The effect would be identical. I suspect (Tim may correct me here) that this is there for the purposes of efficiency - i.e. that it's cheaper to only call the other function if it is going to be used (rather than calling it to simply give the same token back). Given that this function is highly used, little things count. If you care you could run timeit tests over the function with various bodies (e.g. empty, small, medium, large, including email addresses, not including email addresses, lots of skips, no skips, etc) with the two variants. If the simpler code was in fact faster (unlikely, I think), then maybe it should be replaced. =Tony.Meyer From skip at pobox.com Wed Nov 10 23:44:41 2004 From: skip at pobox.com (Skip Montanaro) Date: Wed Nov 10 23:44:52 2004 Subject: [spambayes-dev] Question about tokenize_word andTokenizer.tokenize_body In-Reply-To: References: Message-ID: <16786.39385.148864.969012@montanaro.dyndns.org> > for w in text.split(): > n = len(w) > # Make sure this range matches in tokenize_word(). > if 3 <= n <= maxword: > yield w > > elif n >= 3: > for t in tokenize_word(w): > yield t Maybe to make the intent clearer, the elif test should be elif n > maxword: Skip From kenny.pitt at gmail.com Thu Nov 11 18:26:00 2004 From: kenny.pitt at gmail.com (Kenny Pitt) Date: Thu Nov 11 18:26:04 2004 Subject: [spambayes-dev] Running with I18n changes Message-ID: <2a052b99041111092662fa7408@mail.gmail.com> I just did a CVS update and picked up the I18n changes for the first time. Now I'm having trouble getting the Outlook add-in to work again. My SpamBayes dropdown comes up as an empty box, and the Spam/Not Spam buttons don't change when I move in and out of my Junk and Unsure folders, and the Spam button doesn't do anything when I click it. I tried running with the trace collector in Pythonwin, but didn't get any exceptions (or for that matter, any trace output at all). Are there any special steps I need to take to set up the I18n stuff before SpamBayes will work properly? -- Kenny Pitt From kenny.pitt at gmail.com Thu Nov 11 22:40:07 2004 From: kenny.pitt at gmail.com (Kenny Pitt) Date: Thu Nov 11 22:40:10 2004 Subject: [spambayes-dev] Re: Running with I18n changes In-Reply-To: <2a052b99041111092662fa7408@mail.gmail.com> References: <2a052b99041111092662fa7408@mail.gmail.com> Message-ID: <2a052b9904111113401e877465@mail.gmail.com> Kenny Pitt wrote: > I just did a CVS update and picked up the I18n changes for the first > time. Now I'm having trouble getting the Outlook add-in to work > again. [snip] > > Are there any special steps I need to take to set up the I18n stuff > before SpamBayes will work properly? Nevermind. I found the problem, but it's the last thing I would have expected and I have no idea how it happened. The Outlook add-in apparently was no longer registered, so running "addin.py --register" corrected it. The weird part is that I had just been running the Outlook add-in and noticed that a change I had seen come through checkins wasn't there. I exited from Outlook, did a "cvs update", and immediately restarted Outlook. It was at this point that SpamBayes was no longer active. It's beyond me what could have happened in a CVS update that would result in the addin getting unregistered, but if you ever happen to see this bizarre behavior then you can learn from my mistake. Always expect the unexpected and just re-register. -- Kenny Pitt From tameyer at ihug.co.nz Thu Nov 11 22:26:35 2004 From: tameyer at ihug.co.nz (Tony Meyer) Date: Thu Nov 11 22:42:12 2004 Subject: [spambayes-dev] Running with I18n changes In-Reply-To: Message-ID: > I just did a CVS update and picked up the I18n changes for the first > time. Now I'm having trouble getting the Outlook add-in to work > again. My bad, sorry. I had WinMerge set to ignore whitespace and used it to remove some additions I have in addin.py before I checked in one of the recent changes. But I left it with code indented when it shouldn't have, so addin.py of CVS one minute ago will give a syntax error (hence the no traceback - it doesn't run). I've checked in a fix for this and verified it on a clean install on another machine. My apologies! > Are there any special steps I need to take to set up the I18n stuff > before SpamBayes will work properly? There shouldn't be (assuming you're after English), no. If you want to try the i18n stuff out, then you need to grab the language files that are on the tracker and put them in the 'languages' directory (or do some translating yourself ) as they haven't been checked in yet. =Tony.Meyer From tameyer at ihug.co.nz Thu Nov 11 22:54:05 2004 From: tameyer at ihug.co.nz (Tony Meyer) Date: Thu Nov 11 23:01:53 2004 Subject: [spambayes-dev] Re: Running with I18n changes In-Reply-To: Message-ID: > Nevermind. I found the problem, but it's the last thing I > would have expected and I have no idea how it happened. The > Outlook add-in apparently was no longer registered, so > running "addin.py --register" corrected it. Did you possibly CVS up again (after my checkin today) before reregistering? A syntax error will unregister the addin, I believe - IIRC anything going wrong loading it will cause it to be unregistered. > Always expect the unexpected and just re-register. But with "addin.py --debug", of course ;) =Tony.Meyer From kenny.pitt at gmail.com Thu Nov 11 23:29:19 2004 From: kenny.pitt at gmail.com (Kenny Pitt) Date: Thu Nov 11 23:29:22 2004 Subject: [spambayes-dev] Re: Running with I18n changes In-Reply-To: References: Message-ID: <2a052b990411111429277e29b8@mail.gmail.com> Tony Meyer wrote: > > Nevermind. I found the problem, but it's the last thing I > > would have expected and I have no idea how it happened. The > > Outlook add-in apparently was no longer registered, so > > running "addin.py --register" corrected it. > > Did you possibly CVS up again (after my checkin today) before reregistering? > A syntax error will unregister the addin, I believe - IIRC anything going > wrong loading it will cause it to be unregistered. I did, indeed, so that was probably what happened. I was not aware of the unregister on error behavior. I guess it makes sense. If Python can't load the file that defines the COM object then COM better not try to instantiate it. -- Kenny Pitt From tameyer at ihug.co.nz Thu Nov 11 23:52:23 2004 From: tameyer at ihug.co.nz (Tony Meyer) Date: Thu Nov 11 23:52:59 2004 Subject: [spambayes-dev] Re: Running with I18n changes In-Reply-To: Message-ID: > I was not aware of the unregister on error behavior. You obviously haven't tried to run as much poorly put together code as I have <0.5 wink>... BTW, if you do want to try out the (Outlook) i18n stuff, you'll have to use CVS from before the Statistics tab addition (nice, thanks!), or figure out how to regenerate the i18n_dialog.py file appropriately (don't ask me!), as the one on the tracker is now broken (just as well as don't change the fundamental elements in the dialog very often). =Tony.Meyer From tim.peters at gmail.com Fri Nov 12 04:52:47 2004 From: tim.peters at gmail.com (Tim Peters) Date: Fri Nov 12 04:52:51 2004 Subject: [spambayes-dev] Question about tokenize_word andTokenizer.tokenize_body In-Reply-To: References: Message-ID: <1f7befae041111195229fa7b9e@mail.gmail.com> [Daniel Lyons] ... >> What's the real purpose to this duplicated effort? [Tony Meyer] > I presume you mean the bit about "if 3 <= n <= maxword". True, if this > wasn't there and the code read instead: > > for w in text.split(): > for t in tokenize_word(w): > yield t > > The effect would be identical. Right. > I suspect (Tim may correct me here) that this is there for the purposes of > efficiency - i.e. that it's cheaper to only call the other function if it is going > to be used (rather than calling it to simply give the same token back). > Given that this function is highly used, little things count. Right again. A function call in Python is enormously expensive compared to a simple yield, and speed matters here. Historically, it was all done inline -- tokenize_word() didn't exist at first. When it was added, this loop deferred to it in cases that were expensive anyway (there's some quite elaborate processing done when len > maxword), but kept the speed when the word was simply left alone. > If you care you could run timeit tests over the function with various > bodies (e.g. empty, small, medium, large, including email addresses, not > including email addresses, lots of skips, no skips, etc) with the two > variants. If the simpler code was in fact faster (unlikely, I think), then > maybe it should be replaced. I actually find the current spelling simpler! Seeing if 3 <= n <= maxword: yield w right inside the loop tells me immediately "ah, if it's a normal-sized word, we use it exactly as-is". It's helpful that the normal case is as obvious as possible. It's OK that abnormal cases are harder to understand. From tim.peters at gmail.com Fri Nov 12 04:59:18 2004 From: tim.peters at gmail.com (Tim Peters) Date: Fri Nov 12 04:59:21 2004 Subject: [spambayes-dev] Question about tokenize_word andTokenizer.tokenize_body In-Reply-To: <16786.39385.148864.969012@montanaro.dyndns.org> References: <16786.39385.148864.969012@montanaro.dyndns.org> Message-ID: <1f7befae0411111959223fe39e@mail.gmail.com> [Skip Montanaro] > Maybe to make the intent clearer, the elif test should be > > elif n > maxword: The you couldn't comment out the preceding "if" block without radically changing what the code does. As "elif n >= 3", it doesn't matter whether the preceding "if" block exists or not (except to make "elif" syntactically legal). There's really no reason to make the conditions on 'if's mutually exclusive. What's important is that an 'if'/'elif' condition be correct for the block it controls. Making conditions as broad as possible (but consistent with correctness) aids rapid code experimentation. Then again, I can't claim this particular code is even subject to slow-motion experimentation anymore . From fusion at clanspum.net Fri Nov 12 06:26:56 2004 From: fusion at clanspum.net (Daniel Lyons) Date: Fri Nov 12 06:24:11 2004 Subject: [spambayes-dev] Question about tokenize_word andTokenizer.tokenize_body In-Reply-To: <1f7befae041111195229fa7b9e@mail.gmail.com> References: <1f7befae041111195229fa7b9e@mail.gmail.com> Message-ID: <419449A0.208@clanspum.net> Tim Peters wrote: > I actually find the current spelling simpler! Seeing > > if 3 <= n <= maxword: > yield w > >right inside the loop tells me immediately "ah, if it's a normal-sized >word, we use it exactly as-is". It's helpful that the normal case is >as obvious as possible. It's OK that abnormal cases are harder to >understand. > > I wish I could agree, but then again, I had to post here to be sure. :) I guess I'm also sorry to hear that a function call is such a hit in Python that there are cases where one would want manually inline it, but I guess nothing's perfect. On the other hand, these bits were the only bits in all of tokenizer.py, classifier.py and chi2.py which I found strange or off-putting. The code is beautiful, I learned a lot of interesting Python idioms and design patterns from it. I suppose none of this is news considering the authorship and time put into it studying it. I learned more about spam just from reading these three files than I thought I would ever know, not to mention statistics. An excellent piece of study code, I'm very thankful it's public and under a liberal license. Thanks for taking the time to answer my question, all of you! -- Daniel http://www.storytotell.org -- Tell It! From hernan at orgmf.com.ar Fri Nov 12 10:12:57 2004 From: hernan at orgmf.com.ar (=?us-ascii?Q?Hernan_Martinez_Foffani?=) Date: Fri Nov 12 10:12:20 2004 Subject: [spambayes-dev] Re: Running with I18n changes In-Reply-To: Message-ID: > BTW, if you do want to try out the (Outlook) i18n stuff, you'll have > to use CVS from before the Statistics tab addition (nice, thanks!), > or figure out how to regenerate the i18n_dialog.py file appropriately > (don't ask me!), as the one on the tracker is now broken (just as > well as don't change the fundamental elements in the dialog very > often). Just with rc2py but with a zero for the third parameter. For instance, in Outlook2000\dialogs\resources directory: D:\cvs\...\DIALOGS> rc2py.py es_dialogs.rc i18n_dialogs.py 0 BITMAP IDB_SBLOGO sblogo.bmp BITMAP IDB_SBWIZLOGO sbwizlogo.bmp BITMAP IDB_FOLDERS folders.bmp D:\cvs\...\DIALOGS> The difference with the bundled dialogs.py is that the last one has gettext "enabled" (the strings were wrapped by "_()") and the former hasn't because we assume that the .rc file in the DIALOGS directory is already translated. Regards, -H. From betamagalhaes34kjd4 at hotmail.com Fri Nov 12 17:56:24 2004 From: betamagalhaes34kjd4 at hotmail.com (Roberta Magalhães) Date: Fri Nov 12 17:56:14 2004 Subject: [spambayes-dev] Cadastros de e-mails Message-ID: <20041112165612.2BEB61E4006@bag.python.org> Cadastros de e-mails para mala direta e e-mail marketing. Listas atualizadas e personalizadas. Visite agora: http://www.gueb.de/dvgamail MALA DIRETA, email, E-MAILS PARA MALAS DIRETAS, E-mails por segmenta??o, listas, emails por segmento, Programas gr?tis, Listas, divulga??o por e-mail, spam, softwares gratuitos, mala direta, listas de e-mails, divulga??o de sites e promo??o de home pages, http://www.gueb.de/dvgamail From ETallard at entel.cl Fri Nov 12 22:02:52 2004 From: ETallard at entel.cl (Tallard Cornejo Eduardo Antonio) Date: Fri Nov 12 23:00:03 2004 Subject: [spambayes-dev] Update Message-ID: I have the version 1rc2. I want to install the version 1.0, but I don't miss my actual configuration, what can I do? Regards Eduardo -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/spambayes-dev/attachments/20041112/663e33f0/attachment.html From kennypitt at hotmail.com Sat Nov 13 00:14:20 2004 From: kennypitt at hotmail.com (Kenny Pitt) Date: Sat Nov 13 00:15:11 2004 Subject: [spambayes-dev] Update In-Reply-To: Message-ID: You should be able to install the 1.0 version directly over your existing 1.0rc2. The installation does not affect your configuration or training data. For future reference, the spambayes@python.org mailing list is the most appropriate forum for this type of question. The spambayes-dev@python.org list is primarily for discussion about the development of SpamBayes. -- Kenny Pitt _____ From: spambayes-dev-bounces@python.org [mailto:spambayes-dev-bounces@python.org] On Behalf Of Tallard Cornejo Eduardo Antonio Sent: Friday, November 12, 2004 4:03 PM To: spambayes-dev@python.org Subject: [spambayes-dev] Update I have the version 1rc2. I want to install the version 1.0, but I don't miss my actual configuration, what can I do? -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/spambayes-dev/attachments/20041112/9fd72bb0/attachment.html From ricardobssilverio2kd at mail.com Sun Nov 14 00:06:00 2004 From: ricardobssilverio2kd at mail.com (Ricardo Silverio) Date: Sun Nov 14 00:05:45 2004 Subject: [spambayes-dev] e-mails para mala direta Message-ID: <20041113230543.C39E91E4005@bag.python.org> MALA DIRETA, email, E-MAILS PARA MALAS DIRETAS, E-mails por segmenta??o, listas, emails por segmento Cadastros de e-mails para mala direta e e-mail marketing. Listas atualizadas e personalizadas. http://www.estacion.de/maladireta Listas de e-mails, divulga??o de sites e promo??o de home pages,Programas gr?tis, Listas, divulga??o por e-mail, spam, softwares gratuitos, mala direta, http://www.estacion.de/maladireta From seandarcy at hotmail.com Sun Nov 14 17:55:30 2004 From: seandarcy at hotmail.com (sean darcy) Date: Sun Nov 14 17:56:02 2004 Subject: [spambayes-dev] deprecation warniing with python-2.4 Message-ID: Using CVS from about Nov 1. On startup: sb_server.py SpamBayes POP3 Proxy Version 1.0rc1 (May 2004) and engine SpamBayes Engine Version 0.3 (January 2004). Loading database... SMTP Listener on port 25 is proxying mail.optonline.net:25 Listener on port 110 is proxying mail................. ................................................... User interface url is http://localhost:8880/ /usr/lib/python2.4/email/__init__.py:43: DeprecationWarning: 'strict' argument i s deprecated (and ignored) return Parser(*args, **kws).parsestr(s) python-2.4-0.b2 sean From tameyer at ihug.co.nz Mon Nov 15 01:33:22 2004 From: tameyer at ihug.co.nz (Tony Meyer) Date: Mon Nov 15 01:33:58 2004 Subject: [spambayes-dev] deprecation warniing with python-2.4 In-Reply-To: Message-ID: > Using CVS from about Nov 1. [...] > /usr/lib/python2.4/email/__init__.py:43: DeprecationWarning: 'strict' > argument i s deprecated (and ignored) > return Parser(*args, **kws).parsestr(s) Yes, Barry deprecated the 'strict' keyword arg in the email package (given the new FeedParser system, it's presumably reasonably worthless in email 3.0+). On the 9th, I tidied up some of the code, so I think sb_server and sb_imapfilter will manage to avoid this warning if you cvs-up. I don't recall where offhand, but I believe there's still at least one case of using strict left - that one might be trickier, although since it defaulted to False (IIRC) and False is (again, IIRC) what we want, it might not be too bad. If you want to root out any remaining uses and post a bug/patch to SF, go ahead! You can assign it to me (Anadelonbrin) if you like. By the time 1.1a1 is released that source should work without any warnings from Python 2.4. 1.0.1 should work with Python 2.4, but might still raise the warnings (they're only warnings after all - and the 1.0.x branch will probably be dead before Python 2.5). =Tony.Meyer From brown at dui-dwi.com Tue Nov 16 11:37:29 2004 From: brown at dui-dwi.com (DUI-DWI) Date: Tue Nov 16 11:38:51 2004 Subject: [spambayes-dev] Bug found in latest CVS. In-Reply-To: Message-ID: <20041116103831.PNLV8988.imf16aec.mail.bellsouth.net@seeker> Tony, First off, great job on all of the check ins as of late. I've found a bug in the latest cvs, however. When you try and get the spam clues to a message that was trained prior, nothing happens. Details below. Let me know if you need any more info. -------------------------------------- Traceback (most recent call last): File "D:\!PROGR~1\PYTHON\Lib\site-packages\win32com\server\policy.py", line 283, in _Invoke_ return self._invoke_(dispid, lcid, wFlags, args) File "D:\!PROGR~1\PYTHON\Lib\site-packages\win32com\server\policy.py", line 288, in _invoke_ return S_OK, -1, self._invokeex_(dispid, lcid, wFlags, args, None, None) File "D:\!PROGR~1\PYTHON\Lib\site-packages\win32com\server\policy.py", line 581, in _invokeex_ return func(*args) File "D:\!Programs\SpamBayes for Outlook\spambayes\Outlook2000\addin.py", line 237, in OnClick self.handler(*self.args) File "D:\!Programs\SpamBayes for Outlook\spambayes\Outlook2000\addin.py", line 482, in ShowClues push("This message has %sbeen trained%s." % \ exceptions.KeyError: '1' -------------------------------------- Webmaster -----Original Message----- From: spambayes-dev-bounces@python.org [mailto:spambayes-dev-bounces@python.org] On Behalf Of Tony Meyer Sent: Sunday, November 14, 2004 7:33 PM To: 'sean darcy'; spambayes-dev@python.org Subject: RE: [spambayes-dev] deprecation warniing with python-2.4 > Using CVS from about Nov 1. [...] > /usr/lib/python2.4/email/__init__.py:43: DeprecationWarning: 'strict' > argument i s deprecated (and ignored) > return Parser(*args, **kws).parsestr(s) Yes, Barry deprecated the 'strict' keyword arg in the email package (given the new FeedParser system, it's presumably reasonably worthless in email 3.0+). On the 9th, I tidied up some of the code, so I think sb_server and sb_imapfilter will manage to avoid this warning if you cvs-up. I don't recall where offhand, but I believe there's still at least one case of using strict left - that one might be trickier, although since it defaulted to False (IIRC) and False is (again, IIRC) what we want, it might not be too bad. If you want to root out any remaining uses and post a bug/patch to SF, go ahead! You can assign it to me (Anadelonbrin) if you like. By the time 1.1a1 is released that source should work without any warnings from Python 2.4. 1.0.1 should work with Python 2.4, but might still raise the warnings (they're only warnings after all - and the 1.0.x branch will probably be dead before Python 2.5). =Tony.Meyer _______________________________________________ spambayes-dev mailing list spambayes-dev@python.org http://mail.python.org/mailman/listinfo/spambayes-dev From brown at dui-dwi.com Tue Nov 16 11:38:46 2004 From: brown at dui-dwi.com (DUI-DWI) Date: Tue Nov 16 11:39:49 2004 Subject: [spambayes-dev] RE: Bug found in latest CVS. Message-ID: <20041116103944.PNOQ8988.imf16aec.mail.bellsouth.net@seeker> Forgot to include the last line: ----------------------------------- Traceback (most recent call last): File "D:\!PROGR~1\PYTHON\Lib\site-packages\win32com\server\policy.py", line 283, in _Invoke_ return self._invoke_(dispid, lcid, wFlags, args) File "D:\!PROGR~1\PYTHON\Lib\site-packages\win32com\server\policy.py", line 288, in _invoke_ return S_OK, -1, self._invokeex_(dispid, lcid, wFlags, args, None, None) File "D:\!PROGR~1\PYTHON\Lib\site-packages\win32com\server\policy.py", line 581, in _invokeex_ return func(*args) File "D:\!Programs\SpamBayes for Outlook\spambayes\Outlook2000\addin.py", line 237, in OnClick self.handler(*self.args) File "D:\!Programs\SpamBayes for Outlook\spambayes\Outlook2000\addin.py", line 482, in ShowClues push("This message has %sbeen trained%s." % \ exceptions.KeyError: '1' pythoncom error: Python error invoking COM method. ----------------------------------- Webmaster -----Original Message----- From: DUI-DWI [mailto:brown@dui-dwi.com] Sent: Tuesday, November 16, 2004 5:37 AM To: 'Tony Meyer'; 'spambayes-dev@python.org' Subject: Bug found in latest CVS. Tony, First off, great job on all of the check ins as of late. I've found a bug in the latest cvs, however. When you try and get the spam clues to a message that was trained prior, nothing happens. Details below. Let me know if you need any more info. -------------------------------------- Traceback (most recent call last): File "D:\!PROGR~1\PYTHON\Lib\site-packages\win32com\server\policy.py", line 283, in _Invoke_ return self._invoke_(dispid, lcid, wFlags, args) File "D:\!PROGR~1\PYTHON\Lib\site-packages\win32com\server\policy.py", line 288, in _invoke_ return S_OK, -1, self._invokeex_(dispid, lcid, wFlags, args, None, None) File "D:\!PROGR~1\PYTHON\Lib\site-packages\win32com\server\policy.py", line 581, in _invokeex_ return func(*args) File "D:\!Programs\SpamBayes for Outlook\spambayes\Outlook2000\addin.py", line 237, in OnClick self.handler(*self.args) File "D:\!Programs\SpamBayes for Outlook\spambayes\Outlook2000\addin.py", line 482, in ShowClues push("This message has %sbeen trained%s." % \ exceptions.KeyError: '1' -------------------------------------- Webmaster -----Original Message----- From: spambayes-dev-bounces@python.org [mailto:spambayes-dev-bounces@python.org] On Behalf Of Tony Meyer Sent: Sunday, November 14, 2004 7:33 PM To: 'sean darcy'; spambayes-dev@python.org Subject: RE: [spambayes-dev] deprecation warniing with python-2.4 > Using CVS from about Nov 1. [...] > /usr/lib/python2.4/email/__init__.py:43: DeprecationWarning: 'strict' > argument i s deprecated (and ignored) > return Parser(*args, **kws).parsestr(s) Yes, Barry deprecated the 'strict' keyword arg in the email package (given the new FeedParser system, it's presumably reasonably worthless in email 3.0+). On the 9th, I tidied up some of the code, so I think sb_server and sb_imapfilter will manage to avoid this warning if you cvs-up. I don't recall where offhand, but I believe there's still at least one case of using strict left - that one might be trickier, although since it defaulted to False (IIRC) and False is (again, IIRC) what we want, it might not be too bad. If you want to root out any remaining uses and post a bug/patch to SF, go ahead! You can assign it to me (Anadelonbrin) if you like. By the time 1.1a1 is released that source should work without any warnings from Python 2.4. 1.0.1 should work with Python 2.4, but might still raise the warnings (they're only warnings after all - and the 1.0.x branch will probably be dead before Python 2.5). =Tony.Meyer _______________________________________________ spambayes-dev mailing list spambayes-dev@python.org http://mail.python.org/mailman/listinfo/spambayes-dev From theller at python.net Tue Nov 16 16:39:52 2004 From: theller at python.net (Thomas Heller) Date: Tue Nov 16 16:39:13 2004 Subject: [spambayes-dev] Cannot find saved message Message-ID: <8y91sszb.fsf@python.net> >From time to time, I'm getting this traceback, in the sb_imapfilter: Training Training ham folder INBOX.spambayes.train_ham .................................... 0 trained. Training spam folder INBOX.spambayes.train_spam ................................................................................ ................. 0 trained. Training took 0.4400 seconds, 0 messages were trained. Classifying ....***************************************.Traceback (most recent call last): File "sb_imapfilter.py", line 1040, in ? run() File "sb_imapfilter.py", line 1024, in run imap_filter.Filter() File "sb_imapfilter.py", line 892, in Filter self.unsure_folder, self.ham_folder) File "sb_imapfilter.py", line 788, in Filter msg.Save() File "sb_imapfilter.py", line 559, in Save raise BadIMAPResponseError("Cannot find saved message", "") BadIMAPResponseError: The command 'Cannot find saved message' failed to give an OK response. > c:\sf\spambayes\scripts\sb_imapfilter.py(559)Save() -> raise BadIMAPResponseError("Cannot find saved message", "") (Pdb) Does anyone have a solution to this, before I examine this further? Thanks, Thomas From tameyer at ihug.co.nz Wed Nov 17 01:07:40 2004 From: tameyer at ihug.co.nz (Tony Meyer) Date: Wed Nov 17 01:07:53 2004 Subject: [spambayes-dev] RE: Bug found in latest CVS. In-Reply-To: <20041116103944.PNOQ8988.imf16aec.mail.bellsouth.net@seeker> Message-ID: > I've found a bug in the latest cvs, however. > When you try and get the spam clues to a > message that was trained prior, nothing happens. > Details below. Let me know if you need any more info. [...] > File "D:\!Programs\SpamBayes for Outlook\spambayes\Outlook2000\addin.py", > line 482, in ShowClues > push("This message has %sbeen trained%s." % \ > exceptions.KeyError: '1' Thanks. I'm not sure how that made it through my testing. Anyway, I've checked in a fix, so it should appear in anon CVS shortly. =Tony.Meyer From tameyer at ihug.co.nz Wed Nov 17 01:33:35 2004 From: tameyer at ihug.co.nz (Tony Meyer) Date: Wed Nov 17 01:33:40 2004 Subject: [spambayes-dev] Cannot find saved message In-Reply-To: <8y91sszb.fsf@python.net> Message-ID: >From time to time, I'm getting this traceback, in the sb_imapfilter: [...] > File "sb_imapfilter.py", line 559, in Save > raise BadIMAPResponseError("Cannot find saved message", "") > BadIMAPResponseError: The command 'Cannot find saved message' > failed to give an OK response. [...] > Does anyone have a solution to this, before I examine this further? Not a solution, but there is the material in here: [ 1023797 ] Imapfilter fails: 'Cannot find saved message' I haven't managed to figure this one out yet, sorry. (If you have the time to, that would be great!). I believe the problem comes from the way imapfilter now waits for an EXISTS message from the IMAP server before trying to find the new message (this is to try and overcome a problem the old version had with servers that wouldn't immediately find new messages). However, if you're getting as far as 559, then an EXISTS response has been received, but the newly created message isn't found anyway. (Maybe a different message arrived, but the one we created isn't available? That would be wierd). Running with -i4 ought to give enough detail of the IMAP4 conversation that you can see why its failing. If you don't have time to look at it, if you could attach your -i4 output to the tracker (removing your username/password details) and remind me to get to this quickly, I'll try and do that. =Tony.Meyer From theller at python.net Thu Nov 18 22:00:58 2004 From: theller at python.net (Thomas Heller) Date: Thu Nov 18 22:00:32 2004 Subject: [spambayes-dev] Re: Cannot find saved message References: <8y91sszb.fsf@python.net> Message-ID: "Tony Meyer" writes: >>From time to time, I'm getting this traceback, in the sb_imapfilter: > [...] >> File "sb_imapfilter.py", line 559, in Save >> raise BadIMAPResponseError("Cannot find saved message", "") >> BadIMAPResponseError: The command 'Cannot find saved message' >> failed to give an OK response. > [...] >> Does anyone have a solution to this, before I examine this further? > > Not a solution, but there is the material in here: > > [ 1023797 ] Imapfilter fails: 'Cannot find saved message' > > > I haven't managed to figure this one out yet, sorry. (If you have the > time to, that would be great!). I believe the problem comes from the > way imapfilter now waits for an EXISTS message from the IMAP server > before trying to find the new message (this is to try and overcome a > problem the old version had with servers that wouldn't immediately > find new messages). > > However, if you're getting as far as 559, then an EXISTS response has > been received, but the newly created message isn't found anyway. > (Maybe a different message arrived, but the one we created isn't > available? That would be wierd). > > Running with -i4 ought to give enough detail of the IMAP4 conversation > that you can see why its failing. If you don't have time to look at > it, if you could attach your -i4 output to the tracker (removing your > username/password details) and remind me to get to this quickly, I'll > try and do that. Maybe related, maybe not - running with -i4 seems (?) to cure the problem. At least is has not yet happended again. Thomas From theller at python.net Fri Nov 19 08:30:57 2004 From: theller at python.net (Thomas Heller) Date: Fri Nov 19 08:30:29 2004 Subject: [spambayes-dev] small fix Message-ID: Found a small typo, misspelling of BadIMAPResponseError. Thomas Index: sb_imapfilter.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/scripts/sb_imapfilter.py,v retrieving revision 1.41 diff -c -r1.41 sb_imapfilter.py *** sb_imapfilter.py 13 Oct 2004 02:42:04 -0000 1.41 --- sb_imapfilter.py 19 Nov 2004 07:29:00 -0000 *************** *** 529,535 **** else: command = "append %s %s %s %s" % (self.folder.name, flgs, tme, self.as_string) ! raise BadIMAPReponseError(command) if self.previous_folder is None: self.imap_server.SelectFolder(self.folder.name) --- 529,535 ---- else: command = "append %s %s %s %s" % (self.folder.name, flgs, tme, self.as_string) ! raise BadIMAPResponseError(command) if self.previous_folder is None: self.imap_server.SelectFolder(self.folder.name) From bigjim at cgmailbox.com Sun Nov 21 01:46:19 2004 From: bigjim at cgmailbox.com (Jim Barsz) Date: Sun Nov 21 01:46:46 2004 Subject: [spambayes-dev] FAQs Message-ID: <419FE55B.6040305@cgmailbox.com> How Do I Get Spambayes to load in the systray on startup? From tameyer at ihug.co.nz Mon Nov 22 01:02:51 2004 From: tameyer at ihug.co.nz (Tony Meyer) Date: Mon Nov 22 01:04:00 2004 Subject: [spambayes-dev] small fix In-Reply-To: Message-ID: > Found a small typo, misspelling of BadIMAPResponseError. Thanks; fixed. =Tony.Meyer From kennypitt at hotmail.com Mon Nov 22 17:33:19 2004 From: kennypitt at hotmail.com (Kenny Pitt) Date: Mon Nov 22 17:34:05 2004 Subject: [spambayes-dev] Incorrect Outlook stats Message-ID: I just started working on getting the extended statistics into the Outlook addin, and I noticed something in the stat tracking that isn't doing what I think it should. My SpamBayes accuracy has been so good that I have had no false positives or negatives since Tony added permanent accumulation of the statistics. So, in order to test the statistics for incorrect classifications, I trained one of my good messages as spam and checked to see that it showed up as a false negative. I then trained the message back to good, but the statistics still showed that I had one false negative. It seems like the correct behavior would be to erase the false negative if the message is trained back to its original classification. I thought it would be a simple matter to check if we are training back to the original classification and just decrement the appropriate statistic. The non-Outlook apps store the original classification in the message info db, but it doesn't appear that the Outlook addin does this. Anyone (Tony?) have any suggestions on how to go about fixing this? -- Kenny Pitt From tameyer at ihug.co.nz Tue Nov 23 00:16:59 2004 From: tameyer at ihug.co.nz (Tony Meyer) Date: Tue Nov 23 00:17:34 2004 Subject: [spambayes-dev] Incorrect Outlook stats In-Reply-To: Message-ID: > I just started working on getting the extended statistics > into the Outlook addin, and I noticed something in the stat > tracking that isn't doing what I think it should. > > My SpamBayes accuracy has been so good that I have had no > false positives or negatives since Tony added permanent > accumulation of the statistics. That is good :) > So, in order to test the > statistics for incorrect classifications, I trained one of my > good messages as spam and checked to see that it showed up as > a false negative. > > I then trained the message back to good, but the statistics > still showed that I had one false negative. It seems like > the correct behavior would be to erase the false negative if > the message is trained back to its original classification. An odd case, to be sure :) > I thought it would be a simple matter to check if we are > training back to the original classification and just > decrement the appropriate statistic. The non-Outlook apps > store the original classification in the message info db, but > it doesn't appear that the Outlook addin does this. Anyone > (Tony?) have any suggestions on how to go about fixing this? This is similar to what I needed to do to get the original score/classification in the "show clues" message. In manager.classifier_data.message_db we store the trained status, but not the classification. However, we do store the original score in the "Spam" field (unless that option is turned off, or there was a problem doing so), and can figure it out from there*. However, when we train via the buttons we rescore the message, which changes this field, so that data is lost. AFAICT the only way** to fix this would be to add more information to the message_db (a la the non-Outlook version). I believe we can do this in a backwards-compatible way, although there will be a reasonable number of changes, I suspect. Should I go ahead and do this? * Of course, we store only the score, so if the thresholds have changed, all bets are off. ** Well, other than adding another field to the message, or something like that. =Tony.Meyer From tameyer at ihug.co.nz Tue Nov 23 00:29:27 2004 From: tameyer at ihug.co.nz (Tony Meyer) Date: Tue Nov 23 00:30:16 2004 Subject: [spambayes-dev] FAQs In-Reply-To: Message-ID: > How Do I Get Spambayes to load in the systray on startup? This would be a good suggestion, except that it's only necessary because of a bug that means that the installer doesn't offer to set this up for you. This will be fixed in 1.0.1, which I plan to put together today and announce either tomorrow or Thursday (NZ time). Thanks for the suggestion! =Tony.Meyer From tameyer at ihug.co.nz Tue Nov 23 01:11:32 2004 From: tameyer at ihug.co.nz (Tony Meyer) Date: Tue Nov 23 01:11:38 2004 Subject: [spambayes-dev] FAQ addition request In-Reply-To: Message-ID: > I currently have two suggestions for improvement in the > documentation area: > > The first deals with the Outlook plugin FAQ. The FAQ > makes a big deal about SpamBayes being able to work with both > POP3 and IMAP. Now I used to be a Coder, still do some, but > I am now a Network Designer/Engineer, so I understand what > these do, at least at a high level. But I've never stopped > to think about what (potentially convoluted) protocol M$ > Outlook used to communicate with an Exchange server. I still > haven't cared enough to find out either. But it is/was not > clear to me that SpamBayes would work with an Exchange > server. Even with the terse "Yes" answer to that question. > Given the amount of time spent on POP3 and IMAP > possibilities, the FAQ should really treat this question > better, or have the Windows overview page answer this a bit > more authoritatively. I don't understand what could be more authoritative that "yes". "yes, it does"? "Yes, definitely"? "Yes. Yes. Yes."? <0.5 wink> [...] > The > SpamBayes Outlook plug-in simply watches the Inbox, and > optionally others, for new mail and attempts to apply it's > rule set to those messages. Thus it doesn't attempt to get > between Outlook and Exchange, so there is no problem working > with the above discussed delivery mechanism." I don't believe that this make it any more authoritative, but I'll add this explanation. Thanks for the contribution. > My second suggestion is that, either in the FAQ or a > new "anomaly" section, there needs to be an explanation of > why "Undeliverable" messages are considered to be > "unfilterable". Please check FAQ 5.6 and see if that is a suitable explanation: =Tony.Meyer From ta-meyer at ihug.co.nz Tue Nov 23 05:02:01 2004 From: ta-meyer at ihug.co.nz (Tony Meyer) Date: Tue Nov 23 05:02:36 2004 Subject: [spambayes-dev] Is building with Python 2.4 and not Python 2.3 a bugfix? Message-ID: Hey everyone, I figure that there'll probably be a 1.0.2 (although maybe not a 1.0.3) before we get to a final 1.1 release. By the time we're ready to do 1.0.2, Python 2.4 will be all done. There are various reasons why building the binary with Python 2.4 would be better than with Python 2.3, but the important one is that Python 2.4 has email 3.0. This basically means the end of problems with malformed messages, which is still a reasonably common reported problem ("why am I getting this X-Spambayes-Exception header?"). However, 1.0.2 is only for bugfixes, not for new features. So my question is: is building with the newer Python valid for a bugfix release? (If not, then people just have to wait for 1.1, which will use 2.4). =Tony.Meyer From tim.peters at gmail.com Tue Nov 23 05:27:45 2004 From: tim.peters at gmail.com (Tim Peters) Date: Tue Nov 23 05:27:53 2004 Subject: [spambayes-dev] Is building with Python 2.4 and not Python 2.3 a bugfix? In-Reply-To: References: Message-ID: <1f7befae0411222027117df88b@mail.gmail.com> [Tony Meyer] > I figure that there'll probably be a 1.0.2 (although maybe not a 1.0.3) > before we get to a final 1.1 release. By the time we're ready to do 1.0.2, > Python 2.4 will be all done. > > There are various reasons why building the binary with Python 2.4 would be > better than with Python 2.3, but the important one is that Python 2.4 has > email 3.0. This basically means the end of problems with malformed > messages, which is still a reasonably common reported problem ("why am I > getting this X-Spambayes-Exception header?"). > > However, 1.0.2 is only for bugfixes, not for new features. > > So my question is: is building with the newer Python valid for a bugfix > release? It fixes bugs, right? (like "the end of problems with malformed messages") Micro releases of Python on Windows sometimes ship with new external libraries too, e.g., to fix Tk bugs, or to plug zlib "security holes". This seems much the same. There's a small chance that 2.4 will introduce a relevant bug, and then the decision to use it would look imprudent. In that case you'll be blamed. But if all it does is improve email handling, nobody will praise you. So there's the moral dilemma: you can improve peoples' lives, or cover your ass. Since it's not my ass, I vote you expose it . From ta-meyer at ihug.co.nz Tue Nov 23 05:30:41 2004 From: ta-meyer at ihug.co.nz (Tony Meyer) Date: Tue Nov 23 05:31:16 2004 Subject: [spambayes-dev] Release 1.0.1 Message-ID: Hi everyone, Thanks to a sprained ankle, a bit later than intended, but here we go...I've put together the source dists and binary for 1.0.1. If anyone wants to check these out, that would be great, although the changes are pretty minor compared to 1.0 (and I've run the tests that we have). If there are no issues with these, then I'll do the sourceforge dance and put out the announcements on Thursday (NZ time). Cheers, Tony From tameyer at ihug.co.nz Tue Nov 23 06:01:31 2004 From: tameyer at ihug.co.nz (Tony Meyer) Date: Tue Nov 23 06:02:06 2004 Subject: [spambayes-dev] Release 1.0.1 In-Reply-To: Message-ID: > If anyone wants to check these out, that would be > great [..] Actually, I think there is a problem with the .exe - but the source ones should be fine. Time to go home now, so I'll investigate further tomorrow. =Tony.Meyer From kennypitt at hotmail.com Tue Nov 23 14:59:30 2004 From: kennypitt at hotmail.com (Kenny Pitt) Date: Tue Nov 23 15:00:13 2004 Subject: [spambayes-dev] Incorrect Outlook stats In-Reply-To: Message-ID: Tony Meyer wrote: >> So, in order to test the >> statistics for incorrect classifications, I trained one of my good >> messages as spam and checked to see that it showed up as a false >> negative. >> >> I then trained the message back to good, but the statistics still >> showed that I had one false negative. It seems like the correct >> behavior would be to erase the false negative if the message is >> trained back to its original classification. > > An odd case, to be sure :) Yes, I'll admit that, but that's the life of a developer, eh? I guess it's similar to what would happen if someone clicked the wrong training button on a message and then had to recover it, though. > AFAICT the only way** to fix this would be to add more information to > the message_db (a la the non-Outlook version). I believe we can do > this in a backwards-compatible way, although there will be a > reasonable number of changes, I suspect. Should I go ahead and do > this? > > * Of course, we store only the score, so if the thresholds have > changed, all bets are off. > > ** Well, other than adding another field to the message, or something > like that. I was afraid that would be the case. I wouldn't be opposed to adding an "original score" field to the message if that's the easiest way, but I suspect that putting it into the message db would be a better approach. The decisions we make about what stats to update would be controlled by both the original classification and the training status of the message, so it seems like it would be best to reset all of this when the training data is reset. That would be much easier to do in the message db than in the fields of the messages. -- Kenny Pitt From seandarcy at hotmail.com Wed Nov 24 00:27:58 2004 From: seandarcy at hotmail.com (sean darcy) Date: Wed Nov 24 00:28:06 2004 Subject: [spambayes-dev] sb_server msg.asTokens() should be msg.tokenize()? Message-ID: >From cvs today I got this: sb_server.py SpamBayes POP3 Proxy Version 1.0rc1 (May 2004) and engine SpamBayes Engine Version 0.3 (January 2004). .............................. User interface url is http://localhost:8880/ Traceback (most recent call last): File "/usr/bin/sb_server.py", line 476, in onRetr (prob, clues) = state.bayes.spamprob(msg.asTokens(),\ AttributeError: SBHeaderMessage instance has no attribute 'asTokens' looks like that should be msg.tokenize(). sean From tameyer at ihug.co.nz Wed Nov 24 00:37:54 2004 From: tameyer at ihug.co.nz (Tony Meyer) Date: Wed Nov 24 00:38:30 2004 Subject: [spambayes-dev] sb_server msg.asTokens() should be msg.tokenize()? In-Reply-To: Message-ID: > >From cvs today I got this: > > sb_server.py > SpamBayes POP3 Proxy Version 1.0rc1 (May 2004) > and engine SpamBayes Engine Version 0.3 (January 2004). > .............................. > User interface url is http://localhost:8880/ > Traceback (most recent call last): > File "/usr/bin/sb_server.py", line 476, in onRetr > (prob, clues) = state.bayes.spamprob(msg.asTokens(),\ > AttributeError: SBHeaderMessage instance has no attribute 'asTokens' > > looks like that should be msg.tokenize(). Sorry - somehow I missed sb_server.py when I was checking in the change that caused that. I've checked it in now. Thanks for pointing it out. =Tony.Meyer From tameyer at ihug.co.nz Wed Nov 24 01:18:28 2004 From: tameyer at ihug.co.nz (Tony Meyer) Date: Wed Nov 24 01:19:04 2004 Subject: [spambayes-dev] Release 1.0.1 In-Reply-To: Message-ID: > > If anyone wants to check these out, that would be > > great > [..] > > > > > Actually, I think there is a problem with the .exe - but the > source ones should be fine. Time to go home now, so I'll > investigate further tomorrow. I believe I have resolved this. Both the source and the binary appear to work for me now. =Tony.Meyer From davef at henge.com Wed Nov 24 06:09:46 2004 From: davef at henge.com (Dave Fox) Date: Wed Nov 24 06:09:48 2004 Subject: [spambayes-dev] hammie.db Message-ID: <021001c4d1e3$d0e90510$0502a8c0@powerstudio> The FAQ mentions hammie.db a number of times but my database files are called default_bayes_database.db and default_message_database.db (I'm using the Outlook plugin). Should the FAQ be updated? Dave ps. Thanks for a great weapon to use against spam. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/spambayes-dev/attachments/20041123/a76dc844/attachment.htm From tameyer at ihug.co.nz Wed Nov 24 06:20:22 2004 From: tameyer at ihug.co.nz (Tony Meyer) Date: Wed Nov 24 06:21:01 2004 Subject: [spambayes-dev] hammie.db In-Reply-To: Message-ID: > The FAQ mentions hammie.db a number of times 4, in total. > but my database files are called default_bayes_database.db > and default_message_database.db (I'm using the Outlook plugin). > Should the FAQ be updated? The first reference explains what the database is called with the Outlook plugin, so needs no change. The second one is only about VM, so needs no change. The third one is about retraining from scratch, and has instructions for Outlook users right above it, so needs no change. The fourth one is an example of specifying the database name on the command line, which is not relevant to Outlook, so needs no change. Have I missed something? I can't see any changes that are necessary. If you can point some out, though, I'm happy to do it :) =Tony.Meyer From tameyer at ihug.co.nz Wed Nov 24 06:23:54 2004 From: tameyer at ihug.co.nz (Tony Meyer) Date: Wed Nov 24 06:24:28 2004 Subject: [spambayes-dev] Incorrect Outlook stats In-Reply-To: Message-ID: > > AFAICT the only way** to fix this would be to add more > > information to the message_db (a la the non-Outlook version). > > I believe we can do this in a backwards-compatible way, > > although there will be a reasonable number of changes, I suspect. > > Should I go ahead and do this? > > I was afraid that would be the case. I wouldn't be opposed > to adding an "original score" field to the message if that's > the easiest way, but I suspect that putting it into the > message db would be a better approach. Yes, I agree. > The decisions we make > about what stats to update would be controlled by both the > original classification and the training status of the > message, so it seems like it would be best to reset all of > this when the training data is reset. That would be much > easier to do in the message db than in the fields of the messages. In addition, there are sometimes problems with storing the field (Hotmail & IMAP particularly, I gather, neither of which I use with Outlook), and avoiding those would be good. It would make the code for the 'show clues' function a little simpler, IIRC. In the past I've wondered about storing the id for the original folder in there, too, to get around the problems that there are with that at times. We'd still want to have the field for display purposes, but could maybe limit it to that. I'll work up something and see how it goes :) =Tony.Meyer From ta-meyer at ihug.co.nz Thu Nov 25 01:07:20 2004 From: ta-meyer at ihug.co.nz (Tony Meyer) Date: Thu Nov 25 01:07:58 2004 Subject: [spambayes-dev] Is building with Python 2.4 and not Python 2.3 a bugfix? In-Reply-To: Message-ID: > > So my question is: is building with the newer Python valid > > for a bugfix release? > > It fixes bugs, right? (like "the end of problems with > malformed messages") Yes, although that's not the only thing it does. > Micro releases of Python on Windows sometimes ship with new external > libraries too, e.g., to fix Tk bugs, or to plug zlib "security holes". > This seems much the same. That's good enough precedent for me :) > There's a small chance that 2.4 will > introduce a relevant bug, and then the decision to use it would look > imprudent. In that case you'll be blamed. I can handle that. The chance is very small, I think, since I use 2.4 myself and haven't run anything (since fixing the initial problems). > But if all it does is improve email handling, nobody will praise you. I think it will be unnoticeably faster, too . > So there's the moral > dilemma: you can improve peoples' lives, or cover your ass. Since > it's not my ass, I vote you expose it . I have no fear. If it all goes to custard, then at least I can add that to the 'famous last words of spambayes developers' section on the website quotes page . So I'll do this - but I'll be prudent enough to wait for 2.4 final and not use 2.4rc1 for the release I'll do today... Thanks for the comments :) =Tony.Meyer From ta-meyer at ihug.co.nz Thu Nov 25 07:59:17 2004 From: ta-meyer at ihug.co.nz (Tony Meyer) Date: Thu Nov 25 07:59:56 2004 Subject: [spambayes-dev] ANNOUNCE: SpamBayes release 1.0.1 Message-ID: The SpamBayes team is pleased to announce release 1.0.1 of SpamBayes. As is now usual, this is both a release of the source code and of an installation program for all Microsoft Windows users. This is a bug-fix release that fixes a number of minor issues with the 1.0 release, but includes no new functionality, and is entirely compatible with the 1.0 release. A 1.1a1 release, including many new features, will probably be released early in the new year. Details about the bugs that have been fixed in this release can be found at https://sourceforge.net/project/shownotes.php?release_id=285346 You can get the release via the 'Download' page at http://spambayes.org/download.html Enjoy the new release and your spam-free mailbox :-) As always, thanks to everyone involved in this release. Tony. (on behalf of the SpamBayes team) --- What is SpamBayes? --- The SpamBayes project is working on developing a Bayesian (of sorts) anti-spam filter (in Python), initially based on the work of Paul Graham, but since modified with ideas from Robinson, Peters, et al. The project includes a number of different applications, all using the same core code, ranging from a plug-in for Microsoft Outlook, to a POP3 proxy, to various command-line tools. The Windows installation program will install either the Outlook add-in (for Microsoft Outlook users), or the SpamBayes server program (for all other POP3 mail client users, including Microsoft Outlook Express). All Windows users (including existing users of the Outlook add-in) are encouraged to use the installation program. If you wish to use the source-code version, you will also need to install Python - see README.txt in the source tree for more information. From tameyer at ihug.co.nz Fri Nov 26 00:32:07 2004 From: tameyer at ihug.co.nz (Tony Meyer) Date: Fri Nov 26 00:32:44 2004 Subject: [spambayes-dev] Incorrect Outlook stats In-Reply-To: Message-ID: > I'll work up something and see how it goes :) I've checked in changes to use the spambayes.message MessageInfo classes rather than the custom Outlook one (it should all be seemlessly backwards compatible). The test script still runs as it did before, and using the plug-in still appears to work here. The next step is to store more than just the training data, but that's pretty simple now. I may not get to that until next week, though. Then I can fix the stats problem etc :) Let me know if you experience any troubles after the next time you CVS up :) =Tony.Meyer From hatukanezumi at users.sourceforge.net Sat Nov 27 07:37:54 2004 From: hatukanezumi at users.sourceforge.net (Hatuka*nezumi) Date: Sat Nov 27 08:05:23 2004 Subject: [spambayes-dev] small bugs Message-ID: <20041127153754.4cd2bc9a.hatukanezumi@users.sourceforge.net> I found some slight bugs on sb_server. o Subjects in clues/tokens table aren't cgi.escape()'d so cause xmllib.Error. o Messages containing only header part and no CRLF separator (maybe broken, unsuccessful spam etc.) aren't filtered. --- nezumi From tameyer at ihug.co.nz Mon Nov 29 01:18:25 2004 From: tameyer at ihug.co.nz (Tony Meyer) Date: Mon Nov 29 01:18:48 2004 Subject: [spambayes-dev] small bugs In-Reply-To: Message-ID: > I found some slight bugs on sb_server. > > o Subjects in clues/tokens table aren't cgi.escape()'d so > cause xmllib.Error. Fixed in CVS head. I'll backport this for 1.0.2 at some point. > o Messages containing only header part and no CRLF separator > (maybe broken, unsuccessful spam etc.) aren't filtered. Fixed in CVS head. This changes behaviour somewhat, so I'm reluctant to backport, but the fix will appear in 1.1. Thanks! =Tony.Meyer From ta-meyer at ihug.co.nz Mon Nov 29 23:01:49 2004 From: ta-meyer at ihug.co.nz (Tony Meyer) Date: Mon Nov 29 23:02:26 2004 Subject: [spambayes-dev] Download verification checking request Message-ID: As per [ 1061119 ] Provide verification and/or security measures for downloads the 1.0.1 release (and future releases) includes some measures to try and let people confirm that their download is valid. Specifically, the file sizes and MD5 checksums are listed on the download page: and there are PGP signatures for each of the three files, signed by the release manager (me, for this one). I'm pretty new to this stuff - I'm pretty sure that the MD5s and sizes are correct, but not 100% about the signatures. If someone familiar with this stuff could use the signatures to verify the files, and let me know (either here or via the tracker above) so that I know I've done the process correctly, that would be fantastic. Thanks! =Tony.Meyer -- Please always include the list (spambayes@python.org) in your replies (reply-all), and please don't send me personal mail about SpamBayes. http://www.massey.ac.nz/~tameyer/writing/reply_all.html explains this. From tameyer at ihug.co.nz Mon Nov 29 23:54:59 2004 From: tameyer at ihug.co.nz (Tony Meyer) Date: Mon Nov 29 23:56:12 2004 Subject: [spambayes-dev] RE: [Spambayes] spambayes and kmail In-Reply-To: Message-ID: > This is all pretty damn cool, in my opinion... Agreed :) [...] > (although, obviously without the nice admin > GUI - perhaps it would be worth writing code to generate a prefs GUI > using different toolkits? There's code in shtoom that does this that > could be lifted... I'm interested in this. One thing I'd really like to have for 1.1 is a prefs GUI (for me, I'm thinking about sb_pop3dnd and sb_imapfilter in particular, but the code should be able to handle whatever set of options). I was thinking about putting something together based on the dialog code that Outlook uses, but a non-win32-specific solution would be much better. If I download shtoom, will it be pretty simple to find the relevant code to have a look at it? =Tony.Meyer From anthony at interlink.com.au Tue Nov 30 04:16:15 2004 From: anthony at interlink.com.au (Anthony Baxter) Date: Tue Nov 30 04:17:42 2004 Subject: [spambayes-dev] Re: [Spambayes] Download verification checking request In-Reply-To: References: Message-ID: <200411301416.17348.anthony@interlink.com.au> Suggestions: Use an ascii sig (gpg -ba) Put the sig files on the website directly, they're too small to justify putting in the Files section. Put your key id on the download page - that way people can more easily fetch it from the keyservers. From tameyer at ihug.co.nz Tue Nov 30 06:59:54 2004 From: tameyer at ihug.co.nz (Tony Meyer) Date: Tue Nov 30 07:01:11 2004 Subject: [spambayes-dev] Re: [Spambayes] Download verification checking request In-Reply-To: Message-ID: > Suggestions: Thanks :) > Use an ascii sig (gpg -ba) Done. > Put the sig files on the website directly, they're too small > to justify putting in the Files section. Done. > Put your key id on the download page - that way people can > more easily fetch it from the keyservers. Done. I swapped to GnuPG, too, because using the command line was just so much easier than using the GUI PGP. I've added instructions about doing this to README-DEVEL.txt, so hopefully it should all go smoothly next time :) =Tony.Meyer From scott at lentigo.net Tue Nov 30 14:38:00 2004 From: scott at lentigo.net (Scott Burns) Date: Tue Nov 30 14:38:13 2004 Subject: [spambayes-dev] Re: [Spambayes] Download verification checking request In-Reply-To: References: Message-ID: <41AC77B8.2080904@lentigo.net> I grabbed your public key, the spambayes-1.0.1.tar.gz tarball, and spambayes-1.0.1.tar.gz.asc sig file. I followed the instructions on the website (testing those, too) and everything worked just fine. ...s. Tony Meyer wrote: > As per > > [ 1061119 ] Provide verification and/or security measures for downloads > > > the 1.0.1 release (and future releases) includes some measures to try and > let people confirm that their download is valid. Specifically, the file > sizes and MD5 checksums are listed on the download page: > > > > and there are PGP signatures for each of the three files, signed by the > release manager (me, for this one). > > I'm pretty new to this stuff - I'm pretty sure that the MD5s and sizes are > correct, but not 100% about the signatures. If someone familiar with this > stuff could use the signatures to verify the files, and let me know (either > here or via the tracker above) so that I know I've done the process > correctly, that would be fantastic. > > Thanks! > > =Tony.Meyer -- Scott Burns pub 1024D/9DA64618 2001-11-17 Scott Burns Fingerprint: 2F1B A22E 33C3 FD3D BBE4 D5E2 728B 4753 9DA6 4618