“任一”部分使事情变得复杂,但仍然应该是可行的。
Option 1
由于其他用户决定将其变成一场速度竞赛,这是我的:
from collections import Counter
from itertools import chain
c = Counter(chain.from_iterable(set(x) for x in df1.values.tolist()))
df2['count'] = df2['ID'].map(Counter(c))
df2
ID count
0 jack 3
1 jill 5
2 jane 8
3 joe 9
4 ben 7
5 beatrice 6
176 µs ± 7.69 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Option 2
(原答案)stack
based
c = df1.stack().groupby(level=0).value_counts().count(level=1)
Or,
c = df1.stack().reset_index(level=0).drop_duplicates()[0].value_counts()
Or,
v = df1.stack()
c = v.groupby([v.index.get_level_values(0), v]).count().count(level=1)
# c = v.groupby([v.index.get_level_values(0), v]).nunique().count(level=1)
And,
df2['count'] = df2.ID.map(c)
df2
ID count
0 jack 3
1 jill 5
2 jane 8
3 joe 9
4 ben 7
5 beatrice 6
Option 3
repeat
基于重塑和计数
v = pd.DataFrame({
'i' : df1.values.reshape(-1, ),
'j' : df1.index.repeat(2)
})
c = v.loc[~v.duplicated(), 'i'].value_counts()
df2['count'] = df2.ID.map(c)
df2
ID count
0 jack 3
1 jill 5
2 jane 8
3 joe 9
4 ben 7
5 beatrice 6
Option 4
concat
+ mask
v = pd.concat(
[df1.ID_a, df1.ID_b.mask(df1.ID_a == df1.ID_b)], axis=0
).value_counts()
df2['count'] = df2.ID.map(v)
df2
ID count
0 jack 3
1 jill 5
2 jane 8
3 joe 9
4 ben 7
5 beatrice 6