pandas (in jupyter?) problem

Paulo da Silva p_d_a_s_i_l_v_a_ns at nonetnoaddress.pt
Fri May 6 15:11:07 EDT 2022


Hi all!

I'm having the following problem. Consider the code (the commented or 
the not commented which I think do the same things):

#for col in missing_cols:
#    df[col] = np.nan

df=df.copy()
df[missing_cols]=np.nan

df has about 20000 cols and len(missing_cols) is about 18000.

I'm getting lots (1 by missing_col?) of the following message from 
ipykernel:

"PerformanceWarning: DataFrame is highly fragmented.  This is usually 
the result of calling `frame.insert` many times, which has poor 
performance.  Consider joining all columns at once using 
pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = 
frame.copy()`
   df[missing_cols]=np.nan"


At first I didn't have df=df.copy(). I added it later, but the same problem.

This slows down the code a lot, perhaps because jupyter is taking too 
much time issuing these messages!

Thanks for any comments.


More information about the Python-list mailing list