[scikit-learn] Supervised anomaly detection in time series

Thu Aug 4 20:42:25 EDT 2016

If I train multiple algorithms on different subsamples, then how do I get
the final classifier that predicts unseen data?

I have very few positive samples since it is speed bump detection and we
have very few speed bumps in a drive.
However, I think that  unseen new data would be quite similar to what I
have in training data hence if I can correctly learn a classifier for these
5, I hope it should work well for unseen speed bumps.

Thanks,
Amita

On Thu, Aug 4, 2016 at 5:23 PM, Nicolas Goix <goix.nicolas at gmail.com> wrote:

> You can evaluate the accuracy of your hyper-parameters on a few samples.
> Just don't use the accuracy as your performance measure.
>
> For supervised classification, training multiple algorithms on small
> balanced subsamples usually works well, but 5 anomalies seems indeed to be
> very little.
>
> Nicolas
>
> On Aug 4, 2016 7:51 PM, "Amita Misra" <amisra2 at ucsc.edu> wrote:
>
>> SubSample would remove a lot of information from the negative class.
>> I have more than 500 samples of negative class and just 5 samples of
>> positive class.
>>
>> Amita
>>
>> On Thu, Aug 4, 2016 at 4:43 PM, Nicolas Goix <goix.nicolas at gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> Yes you can use your labeled data (you will need to sub-sample your
>>> normal class to have similar proportion normal-abnormal) to learn your
>>> hyper-parameters through CV.
>>>
>>> You can also try to use supervised classification algorithms on `not too
>>> highly unbalanced' sub-samples.
>>>
>>> Nicolas
>>>
>>> On Thu, Aug 4, 2016 at 5:17 PM, Amita Misra <amisra2 at ucsc.edu> wrote:
>>>
>>>> Hi,
>>>>
>>>> I am currently exploring the problem of speed bump detection using
>>>> accelerometer time series data.
>>>> I have extracted some features based on mean, std deviation etc  within
>>>> a time window.
>>>>
>>>> Since the dataset is highly skewed ( I have just 5  positive samples
>>>> for every > 300 samples)
>>>> I was looking into
>>>>
>>>> One ClassSVM
>>>> covariance.EllipticEnvelope
>>>> sklearn.ensemble.IsolationForest
>>>>
>>>> but I am not sure how to use them.
>>>>
>>>> What I get from docs
>>>> separate the positive examples and train using only negative examples
>>>>
>>>> clf.fit(X_train)
>>>>
>>>> and then
>>>> predict the positive examples using
>>>> clf.predict(X_test)
>>>>
>>>>
>>>> I am not sure what is then the role of positive examples in my training
>>>> dataset or how can I use them to improve my classifier so that I can
>>>> predict better on new samples.
>>>>
>>>>
>>>> Can we do something like Cross validation to learn the parameters as in
>>>> normal binary SVM classification
>>>>
>>>> Thanks,?
>>>> Amita
>>>>
>>>> Amita Misra
>>>> Graduate Student Researcher
>>>> Natural Language and Dialogue Systems Lab
>>>> Baskin School of Engineering
>>>> University of California Santa Cruz
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Amita Misra
>>>> Graduate Student Researcher
>>>> Natural Language and Dialogue Systems Lab
>>>> Baskin School of Engineering
>>>> University of California Santa Cruz
>>>>
>>>>
>>>> _______________________________________________
>>>> scikit-learn mailing list
>>>> scikit-learn at python.org
>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>>
>>>>
>>>
>>> _______________________________________________
>>> scikit-learn mailing list
>>> scikit-learn at python.org
>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>
>>>
>>
>>
>> --
>> Amita Misra
>> Graduate Student Researcher
>> Natural Language and Dialogue Systems Lab
>> Baskin School of Engineering
>> University of California Santa Cruz
>>
>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
>>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
>

-- 
Amita Misra
Graduate Student Researcher
Natural Language and Dialogue Systems Lab
Baskin School of Engineering
University of California Santa Cruz
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20160804/3a62258a/attachment-0001.html>