How to fit & predict in Cat-boost Algorithm

Wed Mar 18 16:00:48 EDT 2020

> On 18 Mar 2020, at 08:59, princit <princit at gmail.com> wrote:
> 
> I am new in python. I am trying to predict the "time_to_failure" for given "acoustic_data" in the test CSV file using catboost algorithm.
> 
> 
> def catbostregtest(X_train, y_train):  
>    # submission format
>    submission = pd.read_csv('sample_submission.csv', index_col='seg_id')
>    X_test = pd.DataFrame()
>    # prepare test data
>    for seg_id in submission.index:
>        seg = pd.read_csv('test/' + seg_id + '.csv')
>        ch = gen_features(seg['acoustic_data'])
>        X_test = X_test.append(ch, ignore_index=True)
>    # model of choice here
>    model = CatBoostRegressor(iterations=10000, loss_function='MAE', boosting_type='Ordered')
>    model.fit(X_train, y_train)      #error line
>    y_hat = model.predict(X_test)    #error line
>    # write submission file LSTM
>    submission['time_to_failure'] = y_hat
>    submission.to_csv('submissionCAT.csv')
>    print(model.best_score_)
> 
> 
> 
> This function "catbostregtest" is giving me error with the errorlog

Have you checked the documentation for the library you are using?
Its clear that the author knows something is wrong, hence the error:

> CatBoostError: c:/goagent/pipelines/buildmaster/catboost.git/catboost/libs/data/quantization.cpp:2424: All features are either constant or ignored.

Hopeful the author documented the error to guide you.

Barry

> 
> Traceback (most recent call last):
> File "E:\dir\question.py", line 68, in main()
> File "E:\dir\question.py", line 65, in main catbostregtest(X_train, y_train)
> File "E:\dir\question.py", line 50, in catbostregtest model.fit(X_train, y_train)
> File "C:\Users\xyz\AppData\Local\Continuum\anaconda3\lib\site-packages\catboost\core.py", line 4330, in fit save_snapshot, snapshot_file, snapshot_interval, init_model)
> File "C:\Users\xyz\AppData\Local\Continuum\anaconda3\lib\site-packages\catboost\core.py", line 1690, in _fit train_params["init_model"]
> File "C:\Users\xyz\AppData\Local\Continuum\anaconda3\lib\site-packages\catboost\core.py", line 1225, in _train self._object._train(train_pool, test_pool, params, allow_clear_pool, init_model._object if init_model else None)
> File "_catboost.pyx", line 3870, in _catboost._CatBoost._train
> File "_catboost.pyx", line 3916, in _catboost._CatBoost._train
> CatBoostError: c:/goagent/pipelines/buildmaster/catboost.git/catboost/libs/data/quantization.cpp:2424: All features are either constant or ignored.
> This is gen_features function
> 
> def gen_features(X):
> 
>    strain = []
> 
>    strain.append(X.mean())
> 
>    strain.append(X.std())
> 
>    strain.append(X.min())
> 
>    strain.append(X.max())
> 
>    strain.append(X.kurtosis())
> 
>    strain.append(X.skew())
> 
>    strain.append(np.quantile(X,0.01))
> 
>    strain.append(np.quantile(X,0.05))
> 
>    strain.append(np.quantile(X,0.95))
> 
>    strain.append(np.quantile(X,0.99))
> 
>    strain.append(np.abs(X).max())
> 
>    strain.append(np.abs(X).mean())
> 
>    strain.append(np.abs(X).std())
> 
>    return pd.Series(strain)
> 
> 
> This function is called from the main function
> 
> def main():
>    train1 = pd.read_csv('train.csv', iterator=True, chunksize=150_000, dtype={'acoustic_data': np.int16, 'time_to_failure': np.float64})
>    X_train = pd.DataFrame()
>    y_train = pd.Series() 
>    for df in train1: 
>        ch = gen_features(df['acoustic_data']) 
>        X_train = X_train.append(ch, ignore_index=True)
>        y_train = y_train.append(pd.Series(df['time_to_failure'].values[-1])) 
>    catbostregtest(X_train, y_train)
> 
> How I can remove the error that occur during making predict from catboost model? How I can remove this error please help. You can download and run the project in spyder ide from this link https://drive.google.com/file/d/1JFsNfE22ef82e-dS0zsZHDE3zGJxnJ_J/view?usp=sharing or https://github.com/princit/catboostAlgorithm
> 
> -- 
> https://mail.python.org/mailman/listinfo/python-list
>