Double groupby
我们首先按“名称”分组,然后再次按“子集集群”和“类型集群”进行分组:
out = df.assign(**{'Subset Cluster': df['Subset'].str.extractall(r'[^a-zA-Z]*([a-zA-Z]+)[^,]*')\
.groupby(level=0)[0].agg(', '.join)})\
.sort_values(by=df.columns.tolist())\
.groupby('Name', as_index=False).agg(', '.join).rename(columns={'Type':'Type Cluster'})\
.groupby(['Subset Cluster', 'Type Cluster'], as_index=False).agg(', '.join)
Output:
Subset Cluster Type Cluster Name Subset System
0 IM, IM, IT LP, OP, OP B03, D09 IM-09-B, IM03A, IT-09, IM, IM, IT A, A, A, B, A, A
1 IT, IU PP, OP A00, B01 IT00, IU00-A, IT-01A, IU A, A, B, B