我经历了不同的 stackoverflow 问题并最终发布它,因为我无法解决我面临的问题之一。我有一个如下所示的数据框
A B C
group1 group1_c 12
group1 group1_c 12
group1 group1_c 12
group1 group1_c 1
group1 group1_c 12
group1 group1_c 12
我必须将两行匹配在一起,每当值匹配时,我就会对其进行求和。去做这个,
df['cumul'] = df['C'].eq(df.groupby(['A','B'])['C'].shift(1).ffill()).groupby([df['A'],df['B']).cumsum()
一旦我这样做了,
A B C Cumul
group1 group1_c 12 0
group1 group1_c 12 1
group1 group1_c 12 2
group1 group1_c 1 2
group1 group1_c 12 3
group1 group1_c 12 3
而如果不满足条件我想重置。预期的解决方案
A B C Cumul
group1 group1_c 12 0
group1 group1_c 12 1
group1 group1_c 12 2
group1 group1_c 1 0
group1 group1_c 12 0
group1 group1_c 12 1
请指教
谢谢
如果需要对每个连续值进行计数C
色谱柱使用Series.ne with Series.shift和累计总和,最后使用计数器GroupBy.cumcount:
df['cumul'] = df.groupby(df['C'].ne(df['C'].shift()).cumsum()).cumcount()
print (df)
A B C cumul
0 group1 group1_c 12 0
1 group1 group1_c 12 1
2 group1 group1_c 12 2
3 group1 group1_c 1 0
4 group1 group1_c 12 0
5 group1 group1_c 12 1
如果需要每A, B
groups 还添加两个组:
print (df)
A B C
0 group1 group1_c 12
1 group1 group2_c 12 <-changed groups
2 group1 group2_c 12 <-changed groups
3 group1 group1_c 1
4 group1 group1_c 12
5 group1 group1_c 12
s = df['C'].ne(df['C'].shift()).cumsum()
df['cumul'] = df.groupby([df['A'],df['B'], s]).cumcount()
df['cumul1'] = df.groupby(df['C'].ne(df['C'].shift()).cumsum()).cumcount()
print (df)
A B C cumul cumul1
0 group1 group1_c 12 0 0
1 group1 group2_c 12 0 1
2 group1 group2_c 12 1 2
3 group1 group1_c 1 0 0
4 group1 group1_c 12 0 0
5 group1 group1_c 12 1 1
替代解决方案:
s = df[['A','B','C']].ne(df[['A','B','C']].shift()).any(axis=1).cumsum()
df['cumul'] = df.groupby(s).cumcount()
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)