From metatracker at psf.upfronthosting.co.za Tue Jul 24 23:35:39 2007 From: metatracker at psf.upfronthosting.co.za (Erik Forsberg) Date: Tue, 24 Jul 2007 21:35:39 -0000 Subject: [Tracker-discuss] [issue110] Showstopper: sf.net is generating invalid XML In-Reply-To: Message-ID: <46A670B1.1070200@efod.se> Erik Forsberg added the comment: Hi Folks, long time, no see. Finally, I found a combination of free time (vacation + bad weather) and motivation to rewrite the importer to the new format. http://bugs.python.org now has a newly imported snapshot. Find free to find my new importer bugs :-), and please report them in the meta tracker (http://psf.upfronthosting.co.za/roundup/meta/). Regards, \EF _______________________________________________________ Meta Tracker _______________________________________________________ From brett at python.org Tue Jul 24 23:44:52 2007 From: brett at python.org (Brett Cannon) Date: Tue, 24 Jul 2007 14:44:52 -0700 Subject: [Tracker-discuss] [issue110] Showstopper: sf.net is generating invalid XML In-Reply-To: <46A670B1.1070200@efod.se> References: <46A670B1.1070200@efod.se> Message-ID: Great! Thanks, Erik! And with the update for SF landing soon maybe we can actually lock down the transition plan and get this thing done soon (i.e., before 2008 =). On 7/24/07, Erik Forsberg wrote: > > Erik Forsberg added the comment: > > Hi Folks, long time, no see. > > Finally, I found a combination of free time (vacation + bad weather) and > motivation to rewrite the importer to the new format. > > http://bugs.python.org now has a newly imported snapshot. Find free to > find my new importer bugs :-), and please report them in the meta > tracker (http://psf.upfronthosting.co.za/roundup/meta/). > > Regards, > \EF > > _______________________________________________________ > Meta Tracker > > _______________________________________________________ > _______________________________________________ > Tracker-discuss mailing list > Tracker-discuss at python.org > http://mail.python.org/mailman/listinfo/tracker-discuss > From metatracker at psf.upfronthosting.co.za Tue Jul 24 23:44:55 2007 From: metatracker at psf.upfronthosting.co.za (Brett C.) Date: Tue, 24 Jul 2007 21:44:55 -0000 Subject: [Tracker-discuss] [issue110] Showstopper: sf.net is generating invalid XML In-Reply-To: <46A670B1.1070200@efod.se> Message-ID: Brett C. added the comment: Great! Thanks, Erik! And with the update for SF landing soon maybe we can actually lock down the transition plan and get this thing done soon (i.e., before 2008 =). On 7/24/07, Erik Forsberg wrote: > > Erik Forsberg added the comment: > > Hi Folks, long time, no see. > > Finally, I found a combination of free time (vacation + bad weather) and > motivation to rewrite the importer to the new format. > > http://bugs.python.org now has a newly imported snapshot. Find free to > find my new importer bugs :-), and please report them in the meta > tracker (http://psf.upfronthosting.co.za/roundup/meta/). > > Regards, > \EF > > _______________________________________________________ > Meta Tracker > > _______________________________________________________ > _______________________________________________ > Tracker-discuss mailing list > Tracker-discuss at python.org > http://mail.python.org/mailman/listinfo/tracker-discuss > _______________________________________________________ Meta Tracker _______________________________________________________ From skip at pobox.com Wed Jul 25 00:17:48 2007 From: skip at pobox.com (skip at pobox.com) Date: Tue, 24 Jul 2007 17:17:48 -0500 Subject: [Tracker-discuss] spam filter? Message-ID: <18086.31372.634266.38728@montanaro.dyndns.org> Has anyone had a look at the SpamBayes spam server I implemented? It's in the latest alpha version of SB (1.1a4). Skip From forsberg at efod.se Wed Jul 25 00:29:12 2007 From: forsberg at efod.se (Erik Forsberg) Date: Wed, 25 Jul 2007 00:29:12 +0200 Subject: [Tracker-discuss] spam auditor checked in In-Reply-To: <18032.5257.477481.918007@montanaro.dyndns.org> References: <18028.7288.172843.721394@montanaro.dyndns.org> <1181721411.10879.7.camel@localhost.localdomain> <18032.5257.477481.918007@montanaro.dyndns.org> Message-ID: <46A67D38.4030600@efod.se> skip at pobox.com skrev: > BAYESCUSTOMIZE=$SBDIR/bayescustomize.ini core_server.py -m XMLRPCPlugin > The xmlrpc server has been installed on psf.upfronthosting.co.za as detailed in your message, using a cvs checkout from an hour ago. Seems to work. Now I think it needs training. Ideas on how to do that? I also modified the detector slightly to make it read its configuration from detectors/config.ini instead of using hardcoded values in the .py. The server at www.webfast.com gives me an 404. Also, I'm a bit confused on how the detector works - could you explain the arguments the XMLRPC method expects? Is the first argument supposed to be a string, or something else? Regards, \EF From skip at pobox.com Wed Jul 25 03:18:43 2007 From: skip at pobox.com (skip at pobox.com) Date: Tue, 24 Jul 2007 20:18:43 -0500 Subject: [Tracker-discuss] spam auditor checked in In-Reply-To: <46A67D38.4030600@efod.se> References: <18028.7288.172843.721394@montanaro.dyndns.org> <1181721411.10879.7.camel@localhost.localdomain> <18032.5257.477481.918007@montanaro.dyndns.org> <46A67D38.4030600@efod.se> Message-ID: <18086.42227.992337.67040@montanaro.dyndns.org> Erik> The xmlrpc server has been installed on psf.upfronthosting.co.za Erik> as detailed in your message, using a cvs checkout from an hour Erik> ago. Seems to work. Note that we switched from CVS to Subversion a couple days ago. I don't think there are any significant differences yet (only my trivial test checkins), but you should track the Subversion repository. (I don't know how to completely disable the CVS repository on SF. Is it even possible?) Erik> Now I think it needs training. Ideas on how to do that? Yes, there are two ways to train. First, there are train and train_mime methods in the XML-RPC server. Second, and certainly more convenient to start with, point your web browser at the URL the server displays when it starts up, probably http://localhost:8880/. (Try http://www.webfast.com/sbmanage/ now.) Your detector should probably be set up to only reject submissions which score as spam. Erik> The server at www.webfast.com gives me an 404. Ah, yes, that wasn't running. I've restarted it. Note however that I was unsuccessful getting the XML-RPC server running behind my Apache reverse proxy. My Apache chops are pretty rusty. I was only working on getting the server running on www.webfast.com because I didn't have direct access to the tracker server. If you can manage it, you'd be better off running it on the same server as the tracker. Only the web interface URL needs to be exposed beyond the localhost (so the tracker admins can train the submissions). That should be protected by Apache authentication. Erik> Also, I'm a bit confused on how the detector works - could you Erik> explain the arguments the XMLRPC method expects? Is the first Erik> argument supposed to be a string, or something else? The score method takes three arguments, a dictionary representing the form submission contents, a possibly empty list of extra tokens which you generate, and a list of attachment dictionaries. See the docstring for spambayes.XMLRPCPlugin.form_to_mime. I also put my test script on the webfast server: http://www.webfast.com/~skip/checkmimemsg.py My intention is that file uploads are transferred in the attachments dictionary as compound data while the normal form data are transferred in the form dictionary. The extra_tokens list should consist of synthetic tokens your detector generates, such as "user:anonymous" or "user:skip" to indicate the login status or "userage:N" where N is something like the log of the number of seconds since the logged in user was registered. One thing I'm unclear how to do is to recover from a submission which is misclassified as spam. You somehow need to recover the contents of that form from somewhere and resubmit the contents. I sort of think this has to happen in the detector. Skip From metatracker at psf.upfronthosting.co.za Wed Jul 25 07:07:46 2007 From: metatracker at psf.upfronthosting.co.za (Myroslav Opyr) Date: Wed, 25 Jul 2007 05:07:46 -0000 Subject: [Tracker-discuss] [issue119] Tracker Documentation Message-ID: <1185340066.06.0.0886892361436.issue119@psf.upfronthosting.co.za> New submission from Myroslav Opyr: Tracker Documentation (http://wiki.python.org/moin/TrackerDocs/) linked on bugs.python.org sidebar should be updated before launch. It should include valid "Getting a Developer account under Roundup" section. And due to recent importer update (#110), there is a chance that "About Differences between SF and Roundup" and "Fields" sections need an update as well, thus adding Eric, to nosy list... ---------- messages: 640 nosy: forsberg, myroslav priority: bug status: unread title: Tracker Documentation _______________________________________________________ Meta Tracker _______________________________________________________ From forsberg at efod.se Wed Jul 25 13:08:49 2007 From: forsberg at efod.se (Erik Forsberg) Date: Wed, 25 Jul 2007 13:08:49 +0200 Subject: [Tracker-discuss] spam auditor checked in In-Reply-To: <18086.42227.992337.67040@montanaro.dyndns.org> References: <18028.7288.172843.721394@montanaro.dyndns.org> <1181721411.10879.7.camel@localhost.localdomain> <18032.5257.477481.918007@montanaro.dyndns.org> <46A67D38.4030600@efod.se> <18086.42227.992337.67040@montanaro.dyndns.org> Message-ID: <46A72F41.2090401@efod.se> skip at pobox.com skrev: > Erik> The xmlrpc server has been installed on psf.upfronthosting.co.za > Erik> as detailed in your message, using a cvs checkout from an hour > Erik> ago. Seems to work. > > Note that we switched from CVS to Subversion a couple days ago. I don't > think there are any significant differences yet (only my trivial test > checkins), but you should track the Subversion repository. Ah. Good thing :-). http://spambayes.sourceforge.net/download.html needs an update, though. > Erik> Now I think it needs training. Ideas on how to do that? > > Yes, there are two ways to train. First, there are train and train_mime > methods in the XML-RPC server. Second, and certainly more convenient to > start with, I'm a programmer. For me, an xmlrpc interface is always more convenient than a web interface :-). > point your web browser at the URL the server displays when it > starts up, probably http://localhost:8880/. I got that running, yes. And I fully agree that it's better if the spambayes server is running on localhost, as we don't want too many external dependencies. As its now up and running on localhost, feel free to turn off the instance on www.webfast.com. > Erik> Also, I'm a bit confused on how the detector works - could you > Erik> explain the arguments the XMLRPC method expects? Is the first > Erik> argument supposed to be a string, or something else? > > The score method takes three arguments, a dictionary representing the form > submission contents, a possibly empty list of extra tokens which you > generate, and a list of attachment dictionaries. See the docstring for > spambayes.XMLRPCPlugin.form_to_mime. > Ah! Now I understand how it works. I was looking in scripts/sb_xmlrpcserver.py which is installed in the bin/ directory. I should have been looking in XMLRPCPlugin.py. Is sb_xmlrpcserver.py perhaps deprecated and on the list of things to be removed? > I also put my test script on the webfast server: > > http://www.webfast.com/~skip/checkmimemsg.py > > My intention is that file uploads are transferred in the attachments > dictionary as compound data while the normal form data are transferred in > the form dictionary. The extra_tokens list should consist of synthetic > tokens your detector generates, such as "user:anonymous" or "user:skip" to > indicate the login status or "userage:N" where N is something like the log > of the number of seconds since the logged in user was registered. > > One thing I'm unclear how to do is to recover from a submission which is > misclassified as spam. You somehow need to recover the contents of that > form from somewhere and resubmit the contents. I sort of think this has to > happen in the detector. > Hmm.. In a complete system, I think it should work as follows: *) An attribute, 'spambayes_score', is added to the file and msg classes (in schema.py). Guess what this attribute will hold.. :-). A boolean attribute 'spambayes_misclassified' should also be added. *) A detector is added that reacts on instances of the file and msg classes. When it fires, it contacts the Spambayes XMLRPC Server and gets a score based on the contents and some syntetical tokens) *) The web pages of the tracker should be modified to not display file and msg instances that are classified as spam for anonymous users. Instead a message should be displayed that tells the user that the file or msg has been classified as spam, and that the user should login and press a button to alert an coordinator if the message is incorrectly classified. *) The web pages should, for logged-in users, display a button that allows ordinary users to alert administrators that a msg/file is misclassified, by setting the 'spambayes_misclassified' attribute. A detector should send mail to coordinators when this happens. *) For coordinators, the web pages should provide buttons for "train as ham" and "train as spam", and when one of these is pressed, the 'spambayes_misclassified' bool should be set to false. For the training buttons to work, one or two new web actions are needed. They are written as python scripts in the extensions directory of the tracker. *) The detectors sending e-mail to various e-mail lists (and to the nosy list) should not send mail when a message is classified as spam. However, if a message was misclassified as spam, they should in an ideal world re-send the message when the message is retrained as ham. The latter might be tricky, though. *) Issues that only have msg/file instances that are spam should probably not be displayed in the tracker. This is quite a lot of work, of course, especially if you're new to roundup. Let me think about this to Message-ID: <46A73180.8040007@efod.se> Erik Forsberg added the comment: Hopefully, the new importer produces the same result as the old one, so there should be no need for documentation updates. Feel free to prove me wrong, especially if your proving includes rigorous testing and comparisons of the data in sourceforge's tracker, and the data on http://bugs.python.org :-). Regards, \EF ---------- nosy: -forsberg status: unread -> chatting _______________________________________________________ Meta Tracker _______________________________________________________ From metatracker at psf.upfronthosting.co.za Wed Jul 25 13:31:30 2007 From: metatracker at psf.upfronthosting.co.za (Myroslav Opyr) Date: Wed, 25 Jul 2007 11:31:30 -0000 Subject: [Tracker-discuss] [issue119] Tracker Documentation In-Reply-To: <46A73180.8040007@efod.se> Message-ID: <50e282530707250431o14e5d714n4871ce141f9272aa@mail.gmail.com> Myroslav Opyr added the comment: I did no testing, just was revisiting the tracker status and found that things started moving, while documentation was incomplete for the launch, and thought that mentioning the recent change can help ensure that documentation is up-to-date in regard to data migration procedure. Reagrds, m. On 7/25/07, Erik Forsberg wrote: > > > Erik Forsberg added the comment: > > Hopefully, the new importer produces the same result as the old one, so > there should be no need for documentation updates. > > Feel free to prove me wrong, especially if your proving includes > rigorous testing and comparisons of the data in sourceforge's tracker, > and the data on http://bugs.python.org :-). > > Regards, > \EF > > ---------- > nosy: -forsberg > status: unread -> chatting > > _______________________________________________________ > Meta Tracker > > _______________________________________________________ > _______________________________________________________ Meta Tracker _______________________________________________________ -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/tracker-discuss/attachments/20070725/b696545c/attachment.html From skip at pobox.com Wed Jul 25 15:20:32 2007 From: skip at pobox.com (skip at pobox.com) Date: Wed, 25 Jul 2007 08:20:32 -0500 Subject: [Tracker-discuss] spam auditor checked in In-Reply-To: <46A72F41.2090401@efod.se> References: <18028.7288.172843.721394@montanaro.dyndns.org> <1181721411.10879.7.camel@localhost.localdomain> <18032.5257.477481.918007@montanaro.dyndns.org> <46A67D38.4030600@efod.se> <18086.42227.992337.67040@montanaro.dyndns.org> <46A72F41.2090401@efod.se> Message-ID: <18087.20000.777804.122533@montanaro.dyndns.org> >> Note that we switched from CVS to Subversion a couple days ago. I >> don't think there are any significant differences yet (only my >> trivial test checkins), but you should track the Subversion >> repository. Erik> Ah. Good thing :-). http://spambayes.sourceforge.net/download.html Erik> needs an update, though. Thanks, I'll fix that up. Erik> Now I think it needs training. Ideas on how to do that? >> >> Yes, there are two ways to train. First, there are train and >> train_mime methods in the XML-RPC server. Second, and certainly more >> convenient to start with, Erik> I'm a programmer. For me, an xmlrpc interface is always more Erik> convenient than a web interface :-). >> point your web browser at the URL the server displays when it >> starts up, probably http://localhost:8880/. Erik> I got that running, yes. And I fully agree that it's better if the Erik> spambayes server is running on localhost, as we don't want too many Erik> external dependencies. As its now up and running on localhost, feel free Erik> to turn off the instance on www.webfast.com. Erik> Also, I'm a bit confused on how the detector works - could you Erik> explain the arguments the XMLRPC method expects? Is the first Erik> argument supposed to be a string, or something else? >> >> The score method takes three arguments, a dictionary representing the form >> submission contents, a possibly empty list of extra tokens which you >> generate, and a list of attachment dictionaries. See the docstring for >> spambayes.XMLRPCPlugin.form_to_mime. >> Erik> Ah! Now I understand how it works. I was looking in Erik> scripts/sb_xmlrpcserver.py which is installed in the bin/ directory. I Erik> should have been looking in XMLRPCPlugin.py. Is sb_xmlrpcserver.py Erik> perhaps deprecated and on the list of things to be removed? Yeah, it's kind of ancient. I'm not aware of anyone who uses it these days. It does have the advantage of being more lightweight than the core_server (no web stuff). Erik> *) An attribute, 'spambayes_score', is added to the file and msg Erik> classes (in schema.py). Guess what this attribute will Erik> hold.. :-). A boolean attribute 'spambayes_misclassified' should Erik> also be added. When do you know it's been misclassified? My thought would be that you have to save all submissionss which score as spam for some period of time, probably with some unique identifier (an incrementing counter would be sufficient). That unique identifier has to propagate to the SpamBayes server. Later on, if you determine that a submission was misclassifed, you use that unique id to retrieve the info you saved and pump it into the tracker. Erik> *) A detector is added that reacts on instances of the file and Erik> msg classes. When it fires, it contacts the Spambayes XMLRPC Erik> Server and gets a score based on the contents and some syntetical Erik> tokens) Yup. Erik> *) The web pages of the tracker should be modified to not display Erik> file and msg instances that are classified as spam for anonymous Erik> users. Instead a message should be displayed that tells the user Erik> that the file or msg has been classified as spam, and that the Erik> user should login and press a button to alert an coordinator if Erik> the message is incorrectly classified. I would hide all submissions which score as spam, whether anonymous or known. Only admins should be able to see spam submissions. ... Erik> This is quite a lot of work, of course, especially if you're new to Erik> roundup. Let me think about this to something simpler. Yeah, that's pretty much beyond my capability. I simply don't have the time to become a Roundup expert. Skip From forsberg at efod.se Wed Jul 25 17:28:56 2007 From: forsberg at efod.se (Erik Forsberg) Date: Wed, 25 Jul 2007 17:28:56 +0200 Subject: [Tracker-discuss] spam auditor checked in In-Reply-To: <18087.20000.777804.122533@montanaro.dyndns.org> References: <18028.7288.172843.721394@montanaro.dyndns.org> <1181721411.10879.7.camel@localhost.localdomain> <18032.5257.477481.918007@montanaro.dyndns.org> <46A67D38.4030600@efod.se> <18086.42227.992337.67040@montanaro.dyndns.org> <46A72F41.2090401@efod.se> <18087.20000.777804.122533@montanaro.dyndns.org> Message-ID: <46A76C38.3090609@efod.se> skip at pobox.com skrev: > Erik> *) An attribute, 'spambayes_score', is added to the file and msg > Erik> classes (in schema.py). Guess what this attribute will > Erik> hold.. :-). A boolean attribute 'spambayes_misclassified' should > Erik> also be added. > > When do you know it's been misclassified? My thought would be that you have > to save all submissionss which score as spam for some period of time, > probably with some unique identifier (an incrementing counter would be > sufficient). That unique identifier has to propagate to the SpamBayes > server. Later on, if you determine that a submission was misclassifed, you > use that unique id to retrieve the info you saved and pump it into the > tracker. > My idea was to set it to False for all file/msg instances that have been successfully classified, and then add a button that allows ordinary users to tag the file/msg as misclassified, which would allow a coordinator to visit the message and press either a 'mark as spam' or a 'mark as ham' button. The former would set spambayes_score to 1.0 and submit the message for training as spam. The latter would set spambayes_score to 0.0 and submit the message for training as ham. Both would clear the spambayes__misclassified flag (set it to False). Does this sound reasonable to you? > I would hide all submissions which score as spam, whether anonymous or > known. Only admins should be able to see spam submissions. > Yeah, that's probably the best way to do it. > Erik> This is quite a lot of work, of course, especially if you're new to > Erik> roundup. Let me think about this to Erik> something simpler. > > Yeah, that's pretty much beyond my capability. I simply don't have the time > to become a Roundup expert. > Well, I'll see if I can find the time to do some of the work. Depends a bit on the weather.. :-). I'll be very happy if you can contribute with some of your knowledge by inspecting my code and answer my questions. It's been a while since I did anti-spam stuff. Fiddled a lot with SMTP filters and spamassassin some years ago. This feature wakes up some of the interest I had in the subject. On the matter of training - will spambayes work best if it gets trained on about the same amount of spam messages as ham messages? That is, if we're training it on 5 spam messages, should we make sure we also train it on 5 ham messages? Regards, \EF From skip at pobox.com Wed Jul 25 18:40:18 2007 From: skip at pobox.com (skip at pobox.com) Date: Wed, 25 Jul 2007 11:40:18 -0500 Subject: [Tracker-discuss] spam auditor checked in In-Reply-To: <46A76C38.3090609@efod.se> References: <18028.7288.172843.721394@montanaro.dyndns.org> <1181721411.10879.7.camel@localhost.localdomain> <18032.5257.477481.918007@montanaro.dyndns.org> <46A67D38.4030600@efod.se> <18086.42227.992337.67040@montanaro.dyndns.org> <46A72F41.2090401@efod.se> <18087.20000.777804.122533@montanaro.dyndns.org> <46A76C38.3090609@efod.se> Message-ID: <18087.31986.211279.285383@montanaro.dyndns.org> Erik> skip at pobox.com skrev: Erik> *) An attribute, 'spambayes_score', is added to the file and msg Erik> classes (in schema.py). Guess what this attribute will Erik> hold.. :-). A boolean attribute 'spambayes_misclassified' should Erik> also be added. >> >> When do you know it's been misclassified? My thought would be that >> you have to save all submissionss which score as spam for some period >> of time, probably with some unique identifier (an incrementing >> counter would be sufficient). That unique identifier has to >> propagate to the SpamBayes server. Later on, if you determine that a >> submission was misclassifed, you use that unique id to retrieve the >> info you saved and pump it into the tracker. >> Erik> My idea was to set it to False for all file/msg instances that Erik> have been successfully classified, and then add a button that Erik> allows ordinary users to tag the file/msg as misclassified, which Erik> would allow a coordinator to visit the message and press either a Erik> 'mark as spam' or a 'mark as ham' button. The former would set Erik> spambayes_score to 1.0 and submit the message for training as Erik> spam. The latter would set spambayes_score to 0.0 and submit the Erik> message for training as ham. Both would clear the Erik> spambayes__misclassified flag (set it to False). Erik> Does this sound reasonable to you? It would work, but then you'd wind up exposing spam to the search engine spiders for some period of time (maybe days if it's in a lightly visited corner of the tracker). That might be all the spammer needs (presuming he's trying to leverage the tracker to boost search engine ranking). >> I would hide all submissions which score as spam, whether anonymous >> or known. Only admins should be able to see spam submissions. >> Erik> Yeah, that's probably the best way to do it. This is quite a lot Erik> of work, of course, especially if you're new to roundup. Let me Erik> think about this to simpler. >> >> Yeah, that's pretty much beyond my capability. I simply don't have >> the time to become a Roundup expert. >> Erik> Well, I'll see if I can find the time to do some of the Erik> work. Depends a bit on the weather.. :-). I'll be very happy if Erik> you can contribute with some of your knowledge by inspecting my Erik> code and answer my questions. That I can do. Just let me know any time you have something you want me to look at. Erik> It's been a while since I did anti-spam stuff. Fiddled a lot with Erik> SMTP filters and spamassassin some years ago. This feature wakes Erik> up some of the interest I had in the subject. Erik> On the matter of training - will spambayes work best if it gets Erik> trained on about the same amount of spam messages as ham messages? Erik> That is, if we're training it on 5 spam messages, should we make Erik> sure we also train it on 5 ham messages? Generally, yes. Relatively equal amounts are best, though a 3:1 ratio isn't that big a deal. In my experience with this type of usage (I implemented this for the Mojam and Musi-Cal web servers a couple years ago) it's extremely accurate. I never needed more than 15-20 hams or spams total in my training database. The synthetic tokens you can generate will be extremely helpful in discriminating ham from spam. In the Mojam application, spammers were hitting our concert submission form. They were obviously entering complete garbage for the city/state/country. Consequently, whether or not I could find the city in my lat/long database was an exceedingly good indicator of spamminess (or user typos - which was a nice side benefit). Skip From martin at v.loewis.de Wed Jul 25 18:45:29 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 25 Jul 2007 18:45:29 +0200 Subject: [Tracker-discuss] spam auditor checked in In-Reply-To: <46A76C38.3090609@efod.se> References: <18028.7288.172843.721394@montanaro.dyndns.org> <1181721411.10879.7.camel@localhost.localdomain> <18032.5257.477481.918007@montanaro.dyndns.org> <46A67D38.4030600@efod.se> <18086.42227.992337.67040@montanaro.dyndns.org> <46A72F41.2090401@efod.se> <18087.20000.777804.122533@montanaro.dyndns.org> <46A76C38.3090609@efod.se> Message-ID: <46A77E29.10306@v.loewis.de> > My idea was to set it to False for all file/msg instances that have been > successfully classified, and then add a button that allows ordinary > users to tag the file/msg as misclassified As a minor detail on usability: I think I would prefer a link rather than a button for that, since a link is less intrusive - even though it may otherwise violate UI design guidelines to have a link that actually invokes an action. Regards, Martin From metatracker at psf.upfronthosting.co.za Fri Jul 27 18:13:46 2007 From: metatracker at psf.upfronthosting.co.za (Erik Forsberg) Date: Fri, 27 Jul 2007 16:13:46 -0000 Subject: [Tracker-discuss] [issue120] Allow users to report msg/file instances as misclassified Message-ID: <87sl79vn7r.fsf@uterus.efod.se> New submission from Erik Forsberg: Currently, there's no way for users to report msg/file instances that have been misclassified by spambayes. There is a property for it on file and msg, but no frontend. ---------- messages: 643 nosy: forsberg priority: feature status: unread title: Allow users to report msg/file instances as misclassified _______________________________________________________ Meta Tracker _______________________________________________________ From metatracker at psf.upfronthosting.co.za Fri Jul 27 18:23:01 2007 From: metatracker at psf.upfronthosting.co.za (Erik Forsberg) Date: Fri, 27 Jul 2007 16:23:01 -0000 Subject: [Tracker-discuss] [issue121] Allow re-send of misclassified ham Message-ID: <87myxhvmsc.fsf@uterus.efod.se> New submission from Erik Forsberg: Currently, there's no way to resend a message/file that was initially misclassified as spam to the qa mailinglist when the message is reclassified as ham. Let's deal with this if it becomes a problem, right now, it's low prio. ---------- messages: 644 nosy: forsberg status: unread title: Allow re-send of misclassified ham _______________________________________________________ Meta Tracker _______________________________________________________ From metatracker at psf.upfronthosting.co.za Fri Jul 27 19:07:14 2007 From: metatracker at psf.upfronthosting.co.za (Erik Forsberg) Date: Fri, 27 Jul 2007 17:07:14 -0000 Subject: [Tracker-discuss] [issue120] Allow users to report msg/file instances as misclassified Message-ID: <1185556034.3.0.795939386519.issue120@psf.upfronthosting.co.za> Erik Forsberg added the comment: Coordinators should be alerted when a msg/file is marked as misclassified, to allow them to reclassify the msg/file quickly. This can be done with a detector. ---------- status: unread -> chatting _______________________________________________________ Meta Tracker _______________________________________________________ From metatracker at psf.upfronthosting.co.za Fri Jul 27 19:14:52 2007 From: metatracker at psf.upfronthosting.co.za (Erik Forsberg) Date: Fri, 27 Jul 2007 17:14:52 -0000 Subject: [Tracker-discuss] [issue105] Dealing with spam In-Reply-To: <1174422502.6.0.765978079996.issue105@psf.upfronthosting.co.za> (A. M. Kuchling's message of "Tue, 20 Mar 2007 20:28:22 -0000") Message-ID: <87bqdxvkdx.fsf@uterus.efod.se> Erik Forsberg added the comment: An anti-spam system based on SpamBayes have now been implemented. Here's how it works: *) When a msg or file instance is created, it is sent to a spambayes XMLRPC server running on psf, listening to localhost. *) Spambayes assigns a score. This is stored in an property (spambayes_score) on the msg/file instance. *) If there's an error talking to the SpamBayes server, the msg/file will be let through, with the spambayes_misclassified boolean property set to True. *) Msg/file instances that have a spambayes_score higher than the spambayes_threshold value are marked as spam. Their contents are not visible to anonymous users. *) There's two buttons available on each msg/file instance's page (note: currently not in the issue view, you have to click the 'edit' link for file instances, and the link on the msg number for msg instances) that allows Coordinators to reclassify messages as spam or ham. This reclassification also trains SpamBayes, again by talking to the XMLRPC server. Example location where you can train a file: http://bugs.python.org/file8099 Example location where you can train a msg: http://bugs.python.org/msg52680 Note: You need to be logged in as a user with the Coordinator role to see the buttons. Some small issues (that can wait) remain, see this list: http://psf.upfronthosting.co.za/roundup/meta/issue?%40search_text=&title=&%40columns=title&topic=4&id=&%40columns=id&creation=&creator=&activity=&%40columns=activity&%40sort=activity&actor=&priority=&%40group=priority&status=-1%2C1%2C2%2C3%2C4%2C5%2C6%2C7&%40columns=status&assignedto=&%40columns=assignedto&%40pagesize=50&%40startwith=0&%40queryname=&%40old-queryname=&%40action=search Currently, the spambayes database is not very well trained, as there were not much spam available (it was lost when I made a reimport with the new importer a few days ago). This is probably the first time I actually want some spam.. :-). This system has been implemented both for the python-dev instance and for the meta tracker. \EF -- Erik Forsberg http://efod.se GPG/PGP Key: 1024D/0BAC89D9 ---------- assignedto: montanaro -> forsberg nosy: +forsberg status: chatting -> resolved _______________________________________________________ Meta Tracker _______________________________________________________ From metatracker at psf.upfronthosting.co.za Sat Jul 28 00:11:04 2007 From: metatracker at psf.upfronthosting.co.za (Erik Forsberg) Date: Fri, 27 Jul 2007 22:11:04 -0000 Subject: [Tracker-discuss] [issue105] Dealing with spam Message-ID: <1185574264.33.0.187747856694.issue105@psf.upfronthosting.co.za> Erik Forsberg added the comment: Solution has been documented at http://www.mechanicalcat.net/tech/roundup/wiki/SpamBayesIntegration ---------- status: resolved -> chatting _______________________________________________________ Meta Tracker _______________________________________________________ From metatracker at psf.upfronthosting.co.za Sat Jul 28 01:19:40 2007 From: metatracker at psf.upfronthosting.co.za (Erik Forsberg) Date: Fri, 27 Jul 2007 23:19:40 -0000 Subject: [Tracker-discuss] [issue118] Deploy a detector for writing changes.xml Message-ID: <1185578380.12.0.26731754509.issue118@psf.upfronthosting.co.za> Erik Forsberg added the comment: We need a python 2.4.4-compatible version for it to run on the hosted machine. Sorry about the delay and lack of information. For the rest of your scripts, you might need to know that with the new anti-spam measures now in place (see issue105), you may get a http 403 error if you try to access the contents of a file that has been classified as spam without logging in. \EF ---------- assignedto: -> forsberg nosy: +forsberg status: unread -> deferred _______________________________________________________ Meta Tracker _______________________________________________________ From metatracker at psf.upfronthosting.co.za Sat Jul 28 12:53:27 2007 From: metatracker at psf.upfronthosting.co.za (Michal Kwiatkowski) Date: Sat, 28 Jul 2007 10:53:27 -0000 Subject: [Tracker-discuss] [issue118] Deploy a detector for writing changes.xml Message-ID: <1185620007.63.0.507530825857.issue118@psf.upfronthosting.co.za> Michal Kwiatkowski added the comment: I fixed the detector, so now it should work with Python 2.4. Thanks for the hint about anti-spam feature. As part of the SoC project I will script a bot which will automatically post patch reports as comments for corresponding issues. I will use my ruby tracker account for this (at least for now). Are there any possible problems I should look out for? _______________________________________________________ Meta Tracker _______________________________________________________ -------------- next part -------------- A non-text attachment was scrubbed... Name: changes_xml_writer-2.4.py Type: text/x-python Size: 6977 bytes Desc: not available Url : http://mail.python.org/pipermail/tracker-discuss/attachments/20070728/ba8d3b12/attachment.py From metatracker at psf.upfronthosting.co.za Sun Jul 29 13:58:40 2007 From: metatracker at psf.upfronthosting.co.za (Erik Forsberg) Date: Sun, 29 Jul 2007 11:58:40 -0000 Subject: [Tracker-discuss] [issue118] Deploy a detector for writing changes.xml Message-ID: <1185710320.33.0.954569337067.issue118@psf.upfronthosting.co.za> Erik Forsberg added the comment: New version of changes_xml_writer.py added to svn, bugs.python.org updated and restarted. Should work now. I don't think there's anything special you need to think about as long as the stuff you post are not spam :-). I don't know how you are planning to post the comments, but the e-mail interface is probably easier to use than the web interface. ---------- status: deferred -> resolved _______________________________________________________ Meta Tracker _______________________________________________________ From forsberg at efod.se Sun Jul 29 21:35:52 2007 From: forsberg at efod.se (Erik Forsberg) Date: Sun, 29 Jul 2007 21:35:52 +0200 Subject: [Tracker-discuss] spam auditor checked in In-Reply-To: <18087.31986.211279.285383@montanaro.dyndns.org> (skip@pobox.com's message of "Wed, 25 Jul 2007 11:40:18 -0500") References: <18028.7288.172843.721394@montanaro.dyndns.org> <1181721411.10879.7.camel@localhost.localdomain> <18032.5257.477481.918007@montanaro.dyndns.org> <46A67D38.4030600@efod.se> <18086.42227.992337.67040@montanaro.dyndns.org> <46A72F41.2090401@efod.se> <18087.20000.777804.122533@montanaro.dyndns.org> <46A76C38.3090609@efod.se> <18087.31986.211279.285383@montanaro.dyndns.org> Message-ID: <87ir83ovdz.fsf@uterus.efod.se> skip at pobox.com writes: > Erik> you can contribute with some of your knowledge by inspecting my > Erik> code and answer my questions. > > That I can do. Just let me know any time you have something you > want me to look at. Please have a look at the 'extract_classinfo' function in http://svn.python.org/view/tracker/instances/spambayes_integration/detectors/spambayes.py?rev=56590&view=markup and tell me if you can come up with any other tokens we should use. Also, regarding the 'authorage' token; Currently, it's the number of seconds since the author was created - perhaps it should instead be set to strings, something like this: 0-5s: 5s 5-10s: 10s 10-15s: 15s 15-20s: 20s 20-25s: 25s 25-30s: 30s 30-60s: 60s 60-90s: 90s 90-120s: 2m 120-300s: 5m 300-600s: 10m 600-1200s: 20m ..and so on. I imagine these will be better tokens? For example, if two messages created by (two different) spambots are created 8 and 9 seconds respectively after author creation, they will both be assigned to the '10s' token, instead of being assigned to two different tokens. \EF -- Erik Forsberg http://efod.se GPG/PGP Key: 1024D/0BAC89D9