我想用我准备的系列形式的查找表生成的更准确/完整的一组值替换 DataFrame 中的一列值。
我以为我可以这样做,但结果并不如预期。
这是我要修复的数据框:
In [6]: df_normalised.head(10)
Out[6]:
code name
0 8 Human development
1 11
2 1 Economic management
3 6 Social protection and risk management
4 5 Trade and integration
5 2 Public sector governance
6 11 Environment and natural resources management
7 6 Social protection and risk management
8 7 Social dev/gender/inclusion
9 7 Social dev/gender/inclusion
(注意第 2 行中缺少的名称)。
这是我为进行修复而创建的查找表:
In [20]: names
Out[20]:
1 Economic management
10 Rural development
11 Environment and natural resources management
2 Public sector governance
3 Rule of law
4 Financial and private sector development
5 Trade and integration
6 Social protection and risk management
7 Social dev/gender/inclusion
8 Human development
9 Urban development
dtype: object
这是我认为可以做到的方法:
In [21]: names[df_normalised.head(10).code]
Out[21]:
code
8 Human development
11 Environment and natural resources management
1 Economic management
6 Social protection and risk management
5 Trade and integration
2 Public sector governance
11 Environment and natural resources management
6 Social protection and risk management
7 Social dev/gender/inclusion
7 Social dev/gender/inclusion
dtype: object
但是,我预计上面的结果系列具有与 df_normalized 的索引相同的索引(即 0、1、2、3),而不是基于代码值的索引。
因此,我不确定如何用这些系列值替换 df_normalized 中“名称”列中的原始值,因为索引不同。
顺便说一句,怎么可能有一个像上面那样有重复值的索引呢?
您可以使用map() http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.map.html函数:
In [38]: df_normalised['name'] = df_normalised['code'].map(name)
In [39]: df_normalised
Out[39]:
code name
0 8 Human development
1 11 Environment and natural resources management
2 1 Economic management
3 6 Social protection and risk management
4 5 Trade and integration
5 2 Public sector governance
6 11 Environment and natural resources management
7 6 Social protection and risk management
8 7 Social dev/gender/inclusion
9 7 Social dev/gender/inclusion
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)