[scikit-learn] 1. Re: unclear help file for sklearn.decomposition.pca

Ismael Lemhadri lemhadri at stanford.edu
Mon Oct 16 14:27:11 EDT 2017


@Andreas Muller:
My references do not assume centering, e.g.
http://ufldl.stanford.edu/wiki/index.php/PCA
any reference?



On Mon, Oct 16, 2017 at 10:20 AM, <scikit-learn-request at python.org> wrote:

> Send scikit-learn mailing list submissions to
>         scikit-learn at python.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
>         https://mail.python.org/mailman/listinfo/scikit-learn
> or, via email, send a message with subject or body 'help' to
>         scikit-learn-request at python.org
>
> You can reach the person managing the list at
>         scikit-learn-owner at python.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of scikit-learn digest..."
>
>
> Today's Topics:
>
>    1. Re: unclear help file for sklearn.decomposition.pca
>       (Andreas Mueller)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Mon, 16 Oct 2017 13:19:57 -0400
> From: Andreas Mueller <t3kcit at gmail.com>
> To: scikit-learn at python.org
> Subject: Re: [scikit-learn] unclear help file for
>         sklearn.decomposition.pca
> Message-ID: <04fc445c-d8f3-a3a9-4ab2-0535826a2d03 at gmail.com>
> Content-Type: text/plain; charset="utf-8"; Format="flowed"
>
> The definition of PCA has a centering step, but no scaling step.
>
> On 10/16/2017 11:16 AM, Ismael Lemhadri wrote:
> > Dear Roman,
> > My concern is actually not about not mentioning the scaling but about
> > not mentioning the centering.
> > That is, the sklearn PCA removes the mean but it does not mention it
> > in the help file.
> > This was quite messy for me to debug as I expected it to either: 1/
> > center and scale simultaneously or / not scale and not center either.
> > It would be beneficial to explicit the behavior in the help file in my
> > opinion.
> > Ismael
> >
> > On Mon, Oct 16, 2017 at 8:02 AM, <scikit-learn-request at python.org
> > <mailto:scikit-learn-request at python.org>> wrote:
> >
> >     Send scikit-learn mailing list submissions to
> >     scikit-learn at python.org <mailto:scikit-learn at python.org>
> >
> >     To subscribe or unsubscribe via the World Wide Web, visit
> >     https://mail.python.org/mailman/listinfo/scikit-learn
> >     <https://mail.python.org/mailman/listinfo/scikit-learn>
> >     or, via email, send a message with subject or body 'help' to
> >     scikit-learn-request at python.org
> >     <mailto:scikit-learn-request at python.org>
> >
> >     You can reach the person managing the list at
> >     scikit-learn-owner at python.org <mailto:scikit-learn-owner at python.org>
> >
> >     When replying, please edit your Subject line so it is more specific
> >     than "Re: Contents of scikit-learn digest..."
> >
> >
> >     Today's Topics:
> >
> >     ? ?1. unclear help file for sklearn.decomposition.pca (Ismael
> >     Lemhadri)
> >     ? ?2. Re: unclear help file for sklearn.decomposition.pca
> >     ? ? ? (Roman Yurchak)
> >     ? ?3. Question about LDA's coef_ attribute (Serafeim Loukas)
> >     ? ?4. Re: Question about LDA's coef_ attribute (Alexandre Gramfort)
> >     ? ?5. Re: Question about LDA's coef_ attribute (Serafeim Loukas)
> >
> >
> >     ------------------------------------------------------------
> ----------
> >
> >     Message: 1
> >     Date: Sun, 15 Oct 2017 18:42:56 -0700
> >     From: Ismael Lemhadri <lemhadri at stanford.edu
> >     <mailto:lemhadri at stanford.edu>>
> >     To: scikit-learn at python.org <mailto:scikit-learn at python.org>
> >     Subject: [scikit-learn] unclear help file for
> >     ? ? ? ? sklearn.decomposition.pca
> >     Message-ID:
> >     ? ? ? ?
> >     <CANpSPFTgv+Oz7f97dandmrBBayqf_o9w=18oKHCFN0u5DNzj+g at mail.gmail.com
> >     <mailto:18oKHCFN0u5DNzj%2Bg at mail.gmail.com>>
> >     Content-Type: text/plain; charset="utf-8"
> >
> >     Dear all,
> >     The help file for the PCA class is unclear about the preprocessing
> >     performed to the data.
> >     You can check on line 410 here:
> >     https://github.com/scikit-learn/scikit-learn/blob/ef5cb84a/sklearn/
> >     decomposition/pca.py#L410
> >     <https://github.com/scikit-learn/scikit-learn/blob/
> ef5cb84a/sklearn/%0Adecomposition/pca.py#L410>
> >     that the matrix is centered but NOT scaled, before performing the
> >     singular
> >     value decomposition.
> >     However, the help files do not make any mention of it.
> >     This is unclear for someone who, like me, just wanted to compare
> >     that the
> >     PCA and np.linalg.svd give the same results. In academic settings,
> >     students
> >     are often asked to compare different methods and to check that
> >     they yield
> >     the same results. I expect that many students have confronted this
> >     problem
> >     before...
> >     Best,
> >     Ismael Lemhadri
> >     -------------- next part --------------
> >     An HTML attachment was scrubbed...
> >     URL:
> >     <http://mail.python.org/pipermail/scikit-learn/
> attachments/20171015/c465bde7/attachment-0001.html
> >     <http://mail.python.org/pipermail/scikit-learn/
> attachments/20171015/c465bde7/attachment-0001.html>>
> >
> >     ------------------------------
> >
> >     Message: 2
> >     Date: Mon, 16 Oct 2017 15:16:45 +0200
> >     From: Roman Yurchak <rth.yurchak at gmail.com
> >     <mailto:rth.yurchak at gmail.com>>
> >     To: Scikit-learn mailing list <scikit-learn at python.org
> >     <mailto:scikit-learn at python.org>>
> >     Subject: Re: [scikit-learn] unclear help file for
> >     ? ? ? ? sklearn.decomposition.pca
> >     Message-ID: <b2abdcfd-4736-929e-6304-b93832932043 at gmail.com
> >     <mailto:b2abdcfd-4736-929e-6304-b93832932043 at gmail.com>>
> >     Content-Type: text/plain; charset=utf-8; format=flowed
> >
> >     Ismael,
> >
> >     as far as I saw the sklearn.decomposition.PCA doesn't mention
> >     scaling at
> >     all (except for the whiten parameter which is post-transformation
> >     scaling).
> >
> >     So since it doesn't mention it, it makes sense that it doesn't do any
> >     scaling of the input. Same as np.linalg.svd.
> >
> >     You can verify that PCA and np.linalg.svd yield the same results,
> with
> >
> >     ```
> >     ?>>> import numpy as np
> >     ?>>> from sklearn.decomposition import PCA
> >     ?>>> import numpy.linalg
> >     ?>>> X = np.random.RandomState(42).rand(10, 4)
> >     ?>>> n_components = 2
> >     ?>>> PCA(n_components, svd_solver='full').fit_transform(X)
> >     ```
> >
> >     and
> >
> >     ```
> >     ?>>> U, s, V = np.linalg.svd(X - X.mean(axis=0), full_matrices=False)
> >     ?>>> (X - X.mean(axis=0)).dot(V[:n_components].T)
> >     ```
> >
> >     --
> >     Roman
> >
> >     On 16/10/17 03:42, Ismael Lemhadri wrote:
> >     > Dear all,
> >     > The help file for the PCA class is unclear about the preprocessing
> >     > performed to the data.
> >     > You can check on line 410 here:
> >     >
> >     https://github.com/scikit-learn/scikit-learn/blob/ef5cb84a/sklearn/
> decomposition/pca.py#L410
> >     <https://github.com/scikit-learn/scikit-learn/blob/ef5cb84a/sklearn/
> decomposition/pca.py#L410>
> >     >
> >     <https://github.com/scikit-learn/scikit-learn/blob/ef5cb84a/sklearn/
> decomposition/pca.py#L410
> >     <https://github.com/scikit-learn/scikit-learn/blob/ef5cb84a/sklearn/
> decomposition/pca.py#L410>>
> >     > that the matrix is centered but NOT scaled, before performing the
> >     > singular value decomposition.
> >     > However, the help files do not make any mention of it.
> >     > This is unclear for someone who, like me, just wanted to compare
> >     that
> >     > the PCA and np.linalg.svd give the same results. In academic
> >     settings,
> >     > students are often asked to compare different methods and to
> >     check that
> >     > they yield the same results. I expect that many students have
> >     confronted
> >     > this problem before...
> >     > Best,
> >     > Ismael Lemhadri
> >     >
> >     >
> >     > _______________________________________________
> >     > scikit-learn mailing list
> >     > scikit-learn at python.org <mailto:scikit-learn at python.org>
> >     > https://mail.python.org/mailman/listinfo/scikit-learn
> >     <https://mail.python.org/mailman/listinfo/scikit-learn>
> >     >
> >
> >
> >
> >     ------------------------------
> >
> >     Message: 3
> >     Date: Mon, 16 Oct 2017 15:27:48 +0200
> >     From: Serafeim Loukas <seralouk at gmail.com <mailto:seralouk at gmail.com
> >>
> >     To: scikit-learn at python.org <mailto:scikit-learn at python.org>
> >     Subject: [scikit-learn] Question about LDA's coef_ attribute
> >     Message-ID: <58C6D0DA-9DE5-4EF5-97C1-48159831F5A9 at gmail.com
> >     <mailto:58C6D0DA-9DE5-4EF5-97C1-48159831F5A9 at gmail.com>>
> >     Content-Type: text/plain; charset="us-ascii"
> >
> >     Dear Scikit-learn community,
> >
> >     Since the documentation of the LDA
> >     (http://scikit-learn.org/stable/modules/generated/
> sklearn.discriminant_analysis.LinearDiscriminantAnalysis.html
> >     <http://scikit-learn.org/stable/modules/generated/
> sklearn.discriminant_analysis.LinearDiscriminantAnalysis.html>
> >     <http://scikit-learn.org/stable/modules/generated/
> sklearn.discriminant_analysis.LinearDiscriminantAnalysis.html
> >     <http://scikit-learn.org/stable/modules/generated/
> sklearn.discriminant_analysis.LinearDiscriminantAnalysis.html>>)
> >     is not so clear, I would like to ask if the lda.coef_ attribute
> >     stores the eigenvectors from the SVD decomposition.
> >
> >     Thank you in advance,
> >     Serafeim
> >     -------------- next part --------------
> >     An HTML attachment was scrubbed...
> >     URL:
> >     <http://mail.python.org/pipermail/scikit-learn/
> attachments/20171016/4263df5c/attachment-0001.html
> >     <http://mail.python.org/pipermail/scikit-learn/
> attachments/20171016/4263df5c/attachment-0001.html>>
> >
> >     ------------------------------
> >
> >     Message: 4
> >     Date: Mon, 16 Oct 2017 16:57:52 +0200
> >     From: Alexandre Gramfort <alexandre.gramfort at inria.fr
> >     <mailto:alexandre.gramfort at inria.fr>>
> >     To: Scikit-learn mailing list <scikit-learn at python.org
> >     <mailto:scikit-learn at python.org>>
> >     Subject: Re: [scikit-learn] Question about LDA's coef_ attribute
> >     Message-ID:
> >     ? ? ? ?
> >     <CADeotZricOQhuHJMmW2Z14cqffEQyndYoxn-OgKAvTMQ7V0Y2g at mail.gmail.com
> >     <mailto:CADeotZricOQhuHJMmW2Z14cqffEQyndYoxn-OgKAvTMQ7V0Y2g@
> mail.gmail.com>>
> >     Content-Type: text/plain; charset="UTF-8"
> >
> >     no it stores the direction of the decision function to match the
> >     API of
> >     linear models.
> >
> >     HTH
> >     Alex
> >
> >     On Mon, Oct 16, 2017 at 3:27 PM, Serafeim Loukas
> >     <seralouk at gmail.com <mailto:seralouk at gmail.com>> wrote:
> >     > Dear Scikit-learn community,
> >     >
> >     > Since the documentation of the LDA
> >     >
> >     (http://scikit-learn.org/stable/modules/generated/
> sklearn.discriminant_analysis.LinearDiscriminantAnalysis.html
> >     <http://scikit-learn.org/stable/modules/generated/
> sklearn.discriminant_analysis.LinearDiscriminantAnalysis.html>)
> >     > is not so clear, I would like to ask if the lda.coef_ attribute
> >     stores the
> >     > eigenvectors from the SVD decomposition.
> >     >
> >     > Thank you in advance,
> >     > Serafeim
> >     >
> >     > _______________________________________________
> >     > scikit-learn mailing list
> >     > scikit-learn at python.org <mailto:scikit-learn at python.org>
> >     > https://mail.python.org/mailman/listinfo/scikit-learn
> >     <https://mail.python.org/mailman/listinfo/scikit-learn>
> >     >
> >
> >
> >     ------------------------------
> >
> >     Message: 5
> >     Date: Mon, 16 Oct 2017 17:02:46 +0200
> >     From: Serafeim Loukas <seralouk at gmail.com <mailto:seralouk at gmail.com
> >>
> >     To: Scikit-learn mailing list <scikit-learn at python.org
> >     <mailto:scikit-learn at python.org>>
> >     Subject: Re: [scikit-learn] Question about LDA's coef_ attribute
> >     Message-ID: <413210D2-56AE-41A4-873F-D171BB36539D at gmail.com
> >     <mailto:413210D2-56AE-41A4-873F-D171BB36539D at gmail.com>>
> >     Content-Type: text/plain; charset="us-ascii"
> >
> >     Dear Alex,
> >
> >     Thank you for the prompt response.
> >
> >     Are the eigenvectors stored in some variable ?
> >     Does the lda.scalings_ attribute contain the eigenvectors ?
> >
> >     Best,
> >     Serafeim
> >
> >     > On 16 Oct 2017, at 16:57, Alexandre Gramfort
> >     <alexandre.gramfort at inria.fr <mailto:alexandre.gramfort at inria.fr>>
> >     wrote:
> >     >
> >     > no it stores the direction of the decision function to match the
> >     API of
> >     > linear models.
> >     >
> >     > HTH
> >     > Alex
> >     >
> >     > On Mon, Oct 16, 2017 at 3:27 PM, Serafeim Loukas
> >     <seralouk at gmail.com <mailto:seralouk at gmail.com>> wrote:
> >     >> Dear Scikit-learn community,
> >     >>
> >     >> Since the documentation of the LDA
> >     >>
> >     (http://scikit-learn.org/stable/modules/generated/
> sklearn.discriminant_analysis.LinearDiscriminantAnalysis.html
> >     <http://scikit-learn.org/stable/modules/generated/
> sklearn.discriminant_analysis.LinearDiscriminantAnalysis.html>)
> >     >> is not so clear, I would like to ask if the lda.coef_ attribute
> >     stores the
> >     >> eigenvectors from the SVD decomposition.
> >     >>
> >     >> Thank you in advance,
> >     >> Serafeim
> >     >>
> >     >> _______________________________________________
> >     >> scikit-learn mailing list
> >     >> scikit-learn at python.org <mailto:scikit-learn at python.org>
> >     >> https://mail.python.org/mailman/listinfo/scikit-learn
> >     <https://mail.python.org/mailman/listinfo/scikit-learn>
> >     >>
> >     > _______________________________________________
> >     > scikit-learn mailing list
> >     > scikit-learn at python.org <mailto:scikit-learn at python.org>
> >     > https://mail.python.org/mailman/listinfo/scikit-learn
> >     <https://mail.python.org/mailman/listinfo/scikit-learn>
> >
> >     -------------- next part --------------
> >     An HTML attachment was scrubbed...
> >     URL:
> >     <http://mail.python.org/pipermail/scikit-learn/
> attachments/20171016/505c7da3/attachment.html
> >     <http://mail.python.org/pipermail/scikit-learn/
> attachments/20171016/505c7da3/attachment.html>>
> >
> >     ------------------------------
> >
> >     Subject: Digest Footer
> >
> >     _______________________________________________
> >     scikit-learn mailing list
> >     scikit-learn at python.org <mailto:scikit-learn at python.org>
> >     https://mail.python.org/mailman/listinfo/scikit-learn
> >     <https://mail.python.org/mailman/listinfo/scikit-learn>
> >
> >
> >     ------------------------------
> >
> >     End of scikit-learn Digest, Vol 19, Issue 25
> >     ********************************************
> >
> >
> >
> >
> > _______________________________________________
> > scikit-learn mailing list
> > scikit-learn at python.org
> > https://mail.python.org/mailman/listinfo/scikit-learn
>
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <http://mail.python.org/pipermail/scikit-learn/
> attachments/20171016/f47e63a9/attachment.html>
>
> ------------------------------
>
> Subject: Digest Footer
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
>
> ------------------------------
>
> End of scikit-learn Digest, Vol 19, Issue 28
> ********************************************
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20171016/a8c8929a/attachment-0001.html>


More information about the scikit-learn mailing list