每组从最后几个月到第一个月的计数ID
第一个箱子助手组由GroupBy.cumcount http://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.core.groupby.GroupBy.cumcount.html并聚合sum
,然后聚合sum
with GroupBy.cumsum http://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.core.groupby.GroupBy.cumsum.html, 重塑为DataFrame.unstack http://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.unstack.html, 展平MultiIndex in columns
并添加到创建的 DataFrameGroupBy.last http://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.core.groupby.GroupBy.last.html:
Data:
df = pd.DataFrame({'ID': ['KAR1', 'KAR1', 'KAR1', 'KAR1', 'KAR1', 'KAR1', 'KAR1',
'KAR1', 'KAR1', 'KAR1', 'KAR1', 'KAR1', 'KAR2', 'KAR2', 'KAR2', 'KAR2', 'KAR2', 'KAR2', 'KAR2', 'KAR2', 'KAR2', 'KAR2', 'KAR2', 'KAR2'],
'Year': [20201001, 20201101, 20201201, 20210101, 20210201, 20210301,
20210401, 20210501, 20210601, 20210701, 20210801, 20210901,
20201001, 20201101, 20201201, 20210101, 20210201, 20210301,
20210401, 20210501, 20210601, 20210701, 20210801, 20210901],
'R1': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 4, 3, 2, 1, 9, 2, 6, 5, 3, 30, 34, 20],
'R1_f': [5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 9, 8, 7, 6, 5, 4, 3, 2, 1, 2, 3, 4]})
print (df)
cols = ['R1','R1_f']
g = df.groupby('ID').cumcount(ascending=False) // 3
df1 = df.groupby(['ID',g])[cols].sum().groupby(level=0).cumsum().unstack()
print (df1)
R1 R1_f
0 1 2 3 0 1 2 3
ID
KAR1 33 57 72 78 45 81 108 126
KAR2 84 98 110 119 9 15 30 54
df2 = df.groupby('ID')[['Year'] + cols].last()
df2.columns = pd.MultiIndex.from_product([df2.columns, [-1]])
print (df2)
-1 -1 -1
ID
KAR1 20210901 12 16
KAR2 20210901 20 4
df = df2.join(df1).sort_index(axis=1)
df.columns = [f'{(b + 1) * 3}m_{a}' if b!=-1 else f'{a}_sum' for a, b in df.columns]
df = df.reset_index()
df.insert(1, 'Year', df.pop('Year_sum'))
print (df)
ID Year R1_sum 3m_R1 6m_R1 9m_R1 12m_R1 R1_f_sum 3m_R1_f \
0 KAR1 20210901 12 33 57 72 78 16 45
1 KAR2 20210901 20 84 98 110 119 4 9
6m_R1_f 9m_R1_f 12m_R1_f
0 81 108 126
1 15 30 54