[scikit-learn] partial_fit implementation for IsolationForest

Andreas Mueller t3kcit at gmail.com
Fri May 27 18:10:41 EDT 2016


How about mondrian forests ;)


On 05/26/2016 09:28 AM, Dale T Smith wrote:
>
> I think your idea is an excellent candidate for scikit-learn-contrib
>
> https://github.com/scikit-learn-contrib/scikit-learn-contrib
>
> __________________________________________________________________________________________
> *Dale Smith*| Macy's Systems and Technology | IFS eCommerce | Data 
> Science and Capacity Planning
> | 5985 State Bridge Road, Johns Creek, GA 30097 | dale.t.smith at macys.com
>
> *From:*scikit-learn 
> [mailto:scikit-learn-bounces+dale.t.smith=macys.com at python.org] *On 
> Behalf Of *Nicolas Goix
> *Sent:* Thursday, May 26, 2016 8:51 AM
> *To:* Scikit-learn user and developer mailing list
> *Subject:* Re: [scikit-learn] partial_fit implementation for 
> IsolationForest
>
> ⚠ EXT MSG:
>
> Hello Isaak,
>
> There is a paper from the same authors as iforest but for streaming 
> data: http://ijcai.org/Proceedings/11/Papers/254.pdf
>
>
> For now it is not cited enough (24) to satisfy the sklearn 
> requirements. Waiting for more citations, this could be a nice 
> addition to sklearn-contrib.
>
> Otherwise, we could imagine extending iforest to streaming data by 
> building new
> trees when data come (and removing the oldest ones), prediction still 
> being based on
> the average depth of the forest. I'm not sure this heuristic could be 
> merged on
> scikit-learn, since it is not based on well-cited papers. In the same 
> time,
> it is a natural and simple extension of iforest to streaming data...
>
> Any opinion on it?
>
> Nicolas
>
> 2016-05-26 13:32 GMT+02:00 Arthur Mensch <arthur.mensch at inria.fr 
> <mailto:arthur.mensch at inria.fr>>:
>
> Hi Isaac,
>
> You may have a look at MiniBatchKMeans and MiniBatchDictionaryLearning 
> that both proposes this API. At the moment, you should fit a single 
> mini batch to the estimator using partial_fit, and update the inner 
> attributes accordingly. During the first partial_fit, you should take 
> care of various memory allocation that are needed by the estimator.
>
> Please fill free to create a pull request whenever you think your code 
> is ready for review.
>
> Good luck!
>
> Le 26 mai 2016 13:14, <donkey-hotei at cryptolab.net 
> <mailto:donkey-hotei at cryptolab.net>> a écrit :
>
> hello scikit-learn devs,
>
> After following the work on IsolationForest so far and testing on a 
> real-world problem here we've found this model to be very promising 
> for anomaly detection. However, at present, IsolationForest only fits 
> data in batch even while it may be well suited to incremental on-line 
> learning since one could subsample recent history and older estimators 
> can be dropped progressively.
>
> I'd like to contribute this feature, but being new to ML and 
> scikit-learn I'm curious how I should start making a quick & dirty 
> version to see how this may work. Are there other good examples where 
> one could see the difference between .fit and .partial_fit in other 
> models?
>
> thanks
> isaak y.
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org <mailto:scikit-learn at python.org>
> https://mail.python.org/mailman/listinfo/scikit-learn
>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org <mailto:scikit-learn at python.org>
> https://mail.python.org/mailman/listinfo/scikit-learn
>
> * This is an EXTERNAL EMAIL. Stop and think before clicking a link or 
> opening attachments.
>
>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20160527/5fba0d2c/attachment-0001.html>


More information about the scikit-learn mailing list