[scikit-learn] suggested machine learning algorithm

Sat Oct 1 15:58:39 EDT 2016

Maybe it’s worth switching to LOOCV since you may have a bit of a pessimistic bias here due to the small training set size (in bootstrap you only have asymptotically 0.632 unique samples for training). I would try both linear and nonlinear models; instead of adding more features maybe also try to eliminate some features via L1, feature selection, or feature extraction in addition to trying different algorithms like random forests, gaussian processes, RBF kernel SVM regression, and so forth.

> On Oct 1, 2016, at 10:59 AM, Thomas Evangelidis <tevang3 at gmail.com> wrote:
> 
> Dear scikit-learn users and developers,
> 
> I have a dataset consisting of 42 observation (molnames) and 4 variables (VDWAALS, EEL, EGB, ESURF) with which I want to make a predictive model that estimates the experimental value (Expr). I tried multivariate linear regression using 10,000 bootstrap repeats each time using 21 observations for training and the rest 21 for testing, but the average correlation was only R= 0.1727 +- 0.19779.
> 
> 
> molname                    VDWAALS     EEL               EGB              ESURF        Expr
> CHEMBL108457        -20.4848        -96.5826         23.4584       -5.4045        -7.27193
> CHEMBL388269        -50.3860         28.9403        -51.5147       -6.4061        -6.8022
> CHEMBL244078        -49.1466        -21.9869         17.7999       -6.4588        -6.61742
> CHEMBL244077        -53.4365        -32.8943         34.8723       -7.0384        -6.61742
> CHEMBL396772        -51.4111        -34.4904         36.0326       -6.5443        -5.82207
> ........
> 
> I would like your advice about what other machine learning algorithm I could try with these data. E.g. can I make a decision tree or the observations  and variable are too few to avoid overfitting? I could include more variables but the observations will always remain 42.
> 
> I would greatly appreciate any advice!
> 
> Thomas
> 
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn