[scikit-learn] Creating dataset

Matthieu Brucher matthieu.brucher at gmail.com
Sun Nov 8 08:41:53 EST 2020


data_file["data"], this works only if you have such a column as well.
load_csv can perfectly do what you need, but you have to adapt the
script to what you have in the csv (which is something only you
know!).
You need to understand what the different statements are doing; just
as you need to understand what processing you apply on your data
(whether it's preprocessing or learning) to properly use any machine
learning tool.

Matthieu

Le dim. 8 nov. 2020 à 12:44, Mahmood Naderan <mahmood.nt at gmail.com> a écrit :
>
> Thanks for the replies.
>
> >I'd recommend just reading that csv file with e.g. pandas
> >(https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html),
> >and then just use the dataframe as input to scikit-learn utilities (you may need to
> >separate the features X from the target y).
>
>
> I am trying to follow the steps as described in https://towardsdatascience.com/a-step-by-step-introduction-to-pca-c0d78e26a0dd
>
> I changed
>
> iris = load_iris()
> colors = ["blue","red","green"]
> df = DataFrame(
>     data=np.c_[iris["data"],  iris["target"]], columns= iris["feature_names"] + ["target"])
>
> to
>
> data_file = pd.read_csv("mydata.csv")
> colors = ["blue","red","green","skyblue","indigo","plum","coral","orange","gray","lime"]
> df = DataFrame(
>     data=np.c_[data_file["data"], data_file["target"]], columns=data_file["feature_names"] + ["target"])
>
>
> But I get this error:
>
> Traceback (most recent call last):
>   File "/home/mahmood/.local/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 2895, in get_loc
>     return self._engine.get_loc(casted_key)
>   File "pandas/_libs/index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc
>   File "pandas/_libs/index.pyx", line 101, in pandas._libs.index.IndexEngine.get_loc
>   File "pandas/_libs/hashtable_class_helper.pxi", line 1675, in pandas._libs.hashtable.PyObjectHashTable.get_item
>   File "pandas/_libs/hashtable_class_helper.pxi", line 1683, in pandas._libs.hashtable.PyObjectHashTable.get_item
> KeyError: 'data'
>
> The above exception was the direct cause of the following exception:
>
> Traceback (most recent call last):
>   File "pca_gromacs.py", line 12, in <module>
>     data=np.c_[data_file["data"], data_file["target"]], columns=data_file["feature_names"] + ["target"]
>   File "/home/mahmood/.local/lib/python3.6/site-packages/pandas/core/frame.py", line 2906, in __getitem__
>     indexer = self.columns.get_loc(key)
>   File "/home/mahmood/.local/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 2897, in get_loc
>     raise KeyError(key) from err
> KeyError: 'data'
>
>
>
> It seems that load_iris() do more than read_csv().
>
> Regards,
> Mahmood
>
>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn



-- 
Quantitative researcher, Ph.D.
Blog: http://blog.audio-tk.com/
LinkedIn: http://www.linkedin.com/in/matthieubrucher


More information about the scikit-learn mailing list