我对每个 ID 都有一个这样的数据集;
Months |
ID |
AnnualSalaryChange |
2020-12-01 |
1 |
0 |
2020-11-01 |
1 |
1 |
2020-10-01 |
1 |
0 |
2020-09-01 |
1 |
0 |
2020-08-01 |
1 |
0 |
2020-07-01 |
1 |
0 |
2020-06-01 |
1 |
0 |
2020-05-01 |
1 |
0 |
2020-04-01 |
1 |
0 |
2020-03-01 |
1 |
1 |
2020-02-01 |
1 |
0 |
2020-01-01 |
1 |
0 |
2019-12-01 |
1 |
1 |
2019-11-01 |
1 |
0 |
2019-10-01 |
1 |
0 |
2019-09-01 |
1 |
0 |
2019-08-01 |
1 |
0 |
2019-07-01 |
1 |
0 |
2019-06-01 |
1 |
0 |
2019-05-01 |
1 |
0 |
2019-04-01 |
1 |
0 |
2019-03-01 |
1 |
0 |
2019-02-01 |
1 |
1 |
我想要一个像 AnnualSalaryChangeSumFor12Months 这样的专栏。它应该是每行过去 12 个月(可变)的 AnnualSalaryChange 值的累积总和。对于每一行,它应该追溯到 12 个月前,并对截至该时间的值求和。如果没有 12 行需要求和,则可以对剩余行求和。
Months |
ID |
AnnualSalaryChange |
AnnualSalaryChangeSumFor12Months |
2020-12-01 |
1 |
0 |
2 |
2020-11-01 |
1 |
1 |
3 |
2020-10-01 |
1 |
0 |
2 |
2020-09-01 |
1 |
0 |
2 |
2020-08-01 |
1 |
0 |
2 |
2020-07-01 |
1 |
0 |
2 |
2020-06-01 |
1 |
0 |
2 |
2020-05-01 |
1 |
0 |
2 |
2020-04-01 |
1 |
0 |
2 |
2020-03-01 |
1 |
1 |
2 |
2020-02-01 |
1 |
0 |
1 |
2020-01-01 |
1 |
0 |
2 |
2019-12-01 |
1 |
1 |
2 |
2019-11-01 |
1 |
0 |
1 |
2019-10-01 |
1 |
0 |
1 |
2019-09-01 |
1 |
0 |
1 |
2019-08-01 |
1 |
0 |
1 |
2019-07-01 |
1 |
0 |
1 |
2019-06-01 |
1 |
0 |
1 |
2019-05-01 |
1 |
0 |
1 |
2019-04-01 |
1 |
0 |
1 |
2019-03-01 |
1 |
0 |
1 |
2019-02-01 |
1 |
1 |
1 |
I tried;
df['ChangeSum']=df.groupby(['ID','Year'])['AnnualSalaryChange'].cumsum()
但它并没有给出我真正想要的。我怎样才能做到这一点?谢谢。
尝试这个:
df = df.sort_values(['Months'])
df['AnnualSalaryChangeSumFor12Months'] = df['AnnualSalaryChange'].rolling(window=12, min_periods=1).sum()
df = df.sort_values(['Months'], ascending=False)
groupby 非常相似,但更长一些:
df = df1.sort_values(['ID', 'Months'])
df['AnnualSalaryChangeSumFor12Months'] = df.reset_index(drop=True).groupby(['ID'])['AnnualSalaryChange'].rolling(window=12, min_periods=1).sum().reset_index(level=0, drop=True)
df = df.sort_values(['ID', 'Months'], ascending=False)
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)