pandas finding field difference between two dataframes
zljubisic at gmail.com
zljubisic at gmail.com
Mon Oct 30 04:25:14 EDT 2017
Hi,
I have to compare two pandas dataframes and find difference between each row.
For example, in the code bellow, rows with index 0 and 3 are intentionally different.
Row 0 is different in field A, and row 3 is different in field 3.
After merging dataframes, I can concentrate to the dfm with dfm['_merge'] != 'both'.
How to find which field is different?
End result should be:
row 0, A
row 3, B
Regards.
import pandas as pd
import numpy as np
idx = pd.date_range('01.01.2017', periods=7, freq='D')
A = [False, True, True, False, True, False, True]
B = np.random.randn(7)
C = np.random.randn(7)
data1 = { 'A' : [False, True, True, False, True, False, True],
'B' : list(range(len(idx))),
'C' : list(range(len(idx), len(idx) + len(idx)))
}
dfl = pd.DataFrame(data1, index=idx)
data2 = data1.copy()
data2['A'][0] = True
data2['B'][3] = 30
dfr = pd.DataFrame(data2, index=idx)
# dfm = dfl.merge(dfr, indicator=True, how='outer', left_on=dfl.index, right_on = dfr.index)
dfm = dfl.merge(dfr, indicator=True, how='outer')
print(dfl)
print(dfr)
print(dfm)
print(dfl.iloc[[0,3]])
print(dfr.iloc[[0,3]])
Output:
A B C
2017-01-01 False 0 7
2017-01-02 True 1 8
2017-01-03 True 2 9
2017-01-04 False 3 10
2017-01-05 True 4 11
2017-01-06 False 5 12
2017-01-07 True 6 13
A B C
2017-01-01 True 0 7
2017-01-02 True 1 8
2017-01-03 True 2 9
2017-01-04 False 30 10
2017-01-05 True 4 11
2017-01-06 False 5 12
2017-01-07 True 6 13
A B C _merge
0 False 0 7 left_only
1 True 1 8 both
2 True 2 9 both
3 False 3 10 left_only
4 True 4 11 both
5 False 5 12 both
6 True 6 13 both
7 True 0 7 right_only
8 False 30 10 right_only
Backend TkAgg is interactive backend. Turning interactive mode on.
A B C
2017-01-01 False 0 7
2017-01-04 False 3 10
A B C
2017-01-01 True 0 7
2017-01-04 False 30 10
More information about the Python-list
mailing list