Because DataFrame.update这里效果不好是替代方案 - 首先对第二个 DataFrame 中的新列使用左连接DataFrame.merge:
df2 = df.merge(df1.drop_duplicates('cod_t').rename(columns={'cod_t':'code'}),
on='code',
how='left',
suffixes=('','_'))
print (df2)
ID country money code money_add other time \
0 832932 Other NaN 0.0 NaN [N2, N2, N4] 0 days 01:37:00
1 217#8# NaN NaN NaN NaN [N1, N2, N3] 2 days 01:01:00
2 1329T2 France 12131.0 20.0 3452.0 [N1, N1] 1 days 03:55:00
3 124932 France NaN 16.0 NaN [N2] 0 days 01:28:00
4 194022 France NaN 0.0 NaN [N4, N3] 3 days 02:35:00
money_ money_add_
0 4532.0 72323.0
1 NaN NaN
2 1813.0 27328.0
3 1213.0 23822.0
4 4532.0 72323.0
然后获取带/不带的列名称_
:
cols_with_ = df2.columns[df2.columns.str.endswith('_')]
cols_without_ = cols_with_.str.rstrip('_')
print (cols_with_)
Index(['money_', 'money_add_'], dtype='object')
print (cols_without_)
Index(['money', 'money_add'], dtype='object')
Pass to DataFrame.combine_first最后删除辅助列:
df2[cols_without_] = (df2[cols_without_].combine_first(df2[cols_with_]
.rename(columns=lambda x: x.rstrip('_'))))
df2 = df2.drop(cols_with_, axis=1)
print (df2)
ID country money code money_add other time
0 832932 Other 4532.0 0.0 72323.0 [N2, N2, N4] 0 days 01:37:00
1 217#8# NaN NaN NaN NaN [N1, N2, N3] 2 days 01:01:00
2 1329T2 France 12131.0 20.0 3452.0 [N1, N1] 1 days 03:55:00
3 124932 France 1213.0 16.0 23822.0 [N2] 0 days 01:28:00
4 194022 France 4532.0 0.0 72323.0 [N4, N3] 3 days 02:35:00