所以我知道你可以使用类似的方法来删除重复的行:
the_data.drop_duplicates(subset=['the_key'])
然而,如果the_key
对于某些值来说为 null,如下所示:
the_key C D
1 NaN * *
2 NaN *
3 111 * *
4 111
它将保留那些标记在C
柱子。是否可以得到drop_duplicates
治疗所有nan
一样不同并得到一个输出,保持数据像在D
column?
Use duplicated http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.duplicated.html拴着isna http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.isna.html并按以下条件过滤boolean indexing http://pandas.pydata.org/pandas-docs/stable/indexing.html#boolean-indexing:
df = df[(~df['the_key'].duplicated()) | df['the_key'].isna()]
#fol oldier pandas versions
#df = df[(~df['the_key'].duplicated()) | df['the_key'].isnull()]
print (df)
the_key C D
1 NaN * *
2 NaN *
3 111.0 * *
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)