Incremental PCA

Sat Apr 18 13:01:52 EDT 2020

Rahul Gupta wrote at 2020-4-18 02:56 -0700:
>i wanted to implement incremental PCA.
>Got this code for stack overflow but i am wondering what y = chunk.pop("y") does and what is this argument "y" to pop

Key to answer your question is "what is the type of `chunk`.
Either use python's inspection (--> "type(chunk)") are read the
documentation of `pandas.read_csv`.

Likely, it is a `dict`, documented in the Python documentation.
for a mapping (such as a `dict`): pop removes the given key and returns
its value.

>from sklearn.decomposition import IncrementalPCA
>import csv
>import sys
>import numpy as np
>import pandas as pd
>
>dataset = sys.argv[1]
>chunksize_ = 5 * 25000
>dimensions = 300
>
>reader = pd.read_csv(dataset, sep = ',', chunksize = chunksize_)
>sklearn_pca = IncrementalPCA(n_components=dimensions)
>for chunk in reader:
>    y = chunk.pop("Y")
>    sklearn_pca.partial_fit(chunk)