IndexError for using pandas dataframe values

Peter Otten __peter__ at web.de
Sat May 28 03:25:38 EDT 2016


Peter Otten wrote:

> Daiyue Weng wrote:
> 
>> Hi, I tried to use DataFrame.values to convert a list of columns in a
>> dataframe to a numpy ndarray/matrix,
>> 
>> matrix = df.values[:, list_of_cols]
>> 
>> but got an error,
>> 
>> IndexError: only integers, slices (:), ellipsis (...), numpy.newaxis
>> (None) and integer or boolean arrays are valid indices
>> 
>> so what's the problem with the list of columns I passed in?
>> 
>> many thanks
> 
> Your suggestively named list_of_cols is probably not a list. Have your
> script print its value and type before the failing operation:
> 
>   print(type(list_of_cols), list_of_cols)
>> matrix = df.values[:, list_of_cols]

Am Do Mai 26 2016, 09:21:59 schrieb Daiyue Weng:

[If you had sent this to the list I would have seen it earlier. 
Just in case you didn't solve the problem in the meantime:]

> it prints
> 
> <class 'list'> ['key1', 'key2']

So my initial assumption was wrong -- list_of_cols is a list. However, 
df.values is a numpy array and therefore expects integer indices:

>>> df = pd.DataFrame([[1,2,3],[4,5,6]], columns="key1 key2 key3".split())
>>> df
   key1  key2  key3
0     1     2     3
1     4     5     6

[2 rows x 3 columns]
>>> df.values
array([[1, 2, 3],
       [4, 5, 6]])
>>> df.values[["key1", "key2"]]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: invalid literal for int() with base 10: 'key1'

(I get a different error message, probably because we use different versions 
of numpy)

To fix the problem you can either use integers

>>> df.values[:,[0, 1]]
array([[1, 2],
       [4, 5]])

or select the columns in pandas:

>>> df[["key1", "key2"]].values
array([[1, 2],
       [4, 5]])






More information about the Python-list mailing list