basic Python question

Fri May 8 15:46:24 EDT 2020

On 2020-05-08 20:02, joseph pareti wrote:
> In general I prefer doing:
> 
> 
> X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)
 >clf = RandomForestClassifier(n_estimators = 100, max_depth=
> None) *clf_f = clf.fit(X_train, y_train)* predicted_labels = clf_f.predict(
> X_test) score = clf.score(X_test, y_test) score1 = metrics.accuracy_score(
> y_test, predicted_labels)
> 
> 
> rather than:
> 
> X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33,
> random_state=42) clf0=RandomForestClassifier(n_estimators=100, max_depth=
> None) *clf0.fit(X_train, y_train)* y_pred =clf0.predict(X_test) score=
> metrics.accuracy_score(y_test, y_pred)
> 
> 
> Are the two codes really equivalent?
> 
You didn't give any context and say what package you're using!

After searching for "RandomForestClassifier", I'm guessing that you're 
using scikit.

 From the documentation here:

https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html#sklearn.ensemble.RandomForestClassifier.fit

it says:

     Returns: self : object

so it looks like clf.fit(...) returns clf.

That being the case, then, yes, they're equivalent.