我有以下两列,来自 Pandas DataFrame:
antecedents consequents
apple orange
orange apple
apple water
apple pineapple
water lemon
lemon water
我想删除作为机器人前因和后果出现的重复项,只保留第一个出现的,从而获得:
antecedents consequents
apple orange
apple water
apple pineapple
water lemon
我如何使用 Pandas 来实现这一目标?
Use frozenset
通过两列并测试重复项Series.duplicated:
df2 = df[~df[['antecedents','consequents']].apply(frozenset,axis=1).duplicated()]
或者对每行的值进行排序numpy.sort
:
df1 = pd.DataFrame(np.sort(df[['antecedents','consequents']], axis=1), index=df.index)
df2 = df[~df1.duplicated()]
print (df2)
antecedents consequents
0 apple orange
2 apple water
3 apple pineapple
4 water lemon
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)