[scikit-learn] combining arrays of features to train an MLP

Mon Dec 19 16:56:56 EST 2016

this means that both are feasible?

On 19 December 2016 at 18:17, Sebastian Raschka <se.raschka at gmail.com>
wrote:

> Thanks, Thomas, that makes sense! Will submit a PR then to update the
> docstring.
>
> Best,
> Sebastian
>
>
> > On Dec 19, 2016, at 11:06 AM, Thomas Evangelidis <tevang3 at gmail.com>
> wrote:
> >
> > 
> > Greetings,
> >
> > My dataset consists of objects which are characterised by their
> structural features which are encoded into a so called "fingerprint" form.
> There are several different types of fingerprints, each one encapsulating
> different type of information. I want to combine two specific types of
> fingerprints to train a MLP regressor. The first fingerprint consists of a
> 2048 bit array of the form:
> >
> >  FP1 = array([ 1.,  1.,  0., ...,  0.,  0.,  1.], dtype=float32)
> >
> > The second is a 60 float number array of the form:
> >
> > FP2 = array([ 2.77494618,  0.98973243,  0.34638652,  2.88303715,
> 1.31473857,
> >        -0.56627112,  4.78847547,  2.29587913, -0.6786228 ,  4.63391109,
> >        ...
> >         0.        ,  0.        ,  5.89652792,  0.        ,  0.        ])
> >
> > At first I tried to fuse them into a single 1D array of 2048+60 columns
> but the predictions of the MLP were worse than the 2 different MLP models
> trained from one of the 2 fingerprint types individually. My question: is
> there a more effective way to combine the 2 fingerprints in order to
> indicate that they represent different type of information?
> >
> > To this end, I tried to create a 2-row array (1st row 2048 elements and
> 2nd row 60 elements) but sklearn complained:
> >
> >     mlp.fit(x_train,y_train)
> >   File "/usr/local/lib/python2.7/dist-packages/sklearn/neural_
> network/multilayer_perceptron.py", line 618, in fit
> >     return self._fit(X, y, incremental=False)
> >   File "/usr/local/lib/python2.7/dist-packages/sklearn/neural_
> network/multilayer_perceptron.py", line 330, in _fit
> >     X, y = self._validate_input(X, y, incremental)
> >   File "/usr/local/lib/python2.7/dist-packages/sklearn/neural_
> network/multilayer_perceptron.py", line 1264, in _validate_input
> >     multi_output=True, y_numeric=True)
> >   File "/usr/local/lib/python2.7/dist-packages/sklearn/utils/validation.py",
> line 521, in check_X_y
> >     ensure_min_features, warn_on_dtype, estimator)
> >   File "/usr/local/lib/python2.7/dist-packages/sklearn/utils/validation.py",
> line 402, in check_array
> >     array = array.astype(np.float64)
> > ValueError: setting an array element with a sequence.
> > 
> >
> > Then I tried to create for each object of the dataset a 2D array of
> size 2x2048, by adding 1998 zeros in the second row in order both rows to
> be of equal size. However sklearn complained again:
> >
> >
> >     mlp.fit(x_train,y_train)
> >   File "/usr/local/lib/python2.7/dist-packages/sklearn/neural_
> network/multilayer_perceptron.py", line 618, in fit
> >     return self._fit(X, y, incremental=False)
> >   File "/usr/local/lib/python2.7/dist-packages/sklearn/neural_
> network/multilayer_perceptron.py", line 330, in _fit
> >     X, y = self._validate_input(X, y, incremental)
> >   File "/usr/local/lib/python2.7/dist-packages/sklearn/neural_
> network/multilayer_perceptron.py", line 1264, in _validate_input
> >     multi_output=True, y_numeric=True)
> >   File "/usr/local/lib/python2.7/dist-packages/sklearn/utils/validation.py",
> line 521, in check_X_y
> >     ensure_min_features, warn_on_dtype, estimator)
> >   File "/usr/local/lib/python2.7/dist-packages/sklearn/utils/validation.py",
> line 405, in check_array
> >     % (array.ndim, estimator_name))
> > ValueError: Found array with dim 3. Estimator expected <= 2.
> >
> >
> > In another case of fingerprints, lets name them FP3 and FP4, I observed
> that the MLP regressor created using FP3 yields better results when trained
> and evaluated using logarithmically transformed experimental values (the
> values in y_train and y_test 1D arrays), while the MLP regressor created
> using FP4 yielded better results using the original experimental values. So
> my second question is: when combining both FP3 and FP4 into a single array
> is there any way to designate to the MLP that the features that correspond
> to FP3 must reproduce the logarithmic transform of the experimental values
> while the features of FP4 the original untransformed experimental values?
> >
> >
> > I would greatly appreciate any advice on any of my 2 queries.
> > Thomas
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > --
> > ======================================================================
> > Thomas Evangelidis
> > Research Specialist
> > CEITEC - Central European Institute of Technology
> > Masaryk University
> > Kamenice 5/A35/1S081,
> > 62500 Brno, Czech Republic
> >
> > email: tevang at pharm.uoa.gr
> >               tevang3 at gmail.com
> >
> > website: https://sites.google.com/site/thomasevangelidishomepage/
> >
> >
> > _______________________________________________
> > scikit-learn mailing list
> > scikit-learn at python.org
> > https://mail.python.org/mailman/listinfo/scikit-learn
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>

-- 

======================================================================

Thomas Evangelidis

Research Specialist
CEITEC - Central European Institute of Technology
Masaryk University
Kamenice 5/A35/1S081,
62500 Brno, Czech Republic

email: tevang at pharm.uoa.gr

          tevang3 at gmail.com

website: https://sites.google.com/site/thomasevangelidishomepage/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20161219/3f66a390/attachment-0001.html>