说我有一个num_indices * n
指数矩阵(在range(m)
) and a num_indices * n
值矩阵,即
m, n = 100, 50
num_indices = 100000
indices = np.random.randint(0, m, (num_indices, n))
values = np.random.uniform(0, 1, (num_indices, n))
我想得到一个形状的结果矩阵m * n
,这样每一列都会被索引并在索引和值矩阵中的相应列上求和。以下是一些实现:
tic = time.time()
res1 = np.zeros((m, n))
for i in range(num_indices):
for j in range(n):
res1[indices[i, j], j] += values[i, j]
print(f'element-wise for loop used {time.time() - tic:.5f}s')
tic = time.time()
res2 = np.zeros((m, n))
np.add.at(res2, (indices, np.arange(n)[None, :]), values)
print(f'np.add.at used {time.time() - tic:.5f}s')
tic = time.time()
reslst = []
for colid in range(n):
rescol = np.zeros((m, 1))
np.add.at(rescol, (indices[:, colid], np.zeros(num_indices, dtype=int)), values[:, colid])
reslst.append(rescol)
res3 = np.hstack(reslst)
print(f'np.add.at with loop used {time.time() - tic:.5f}s')
from scipy import sparse
tic = time.time()
res4 = sparse.csr_matrix((values.ravel(), (indices.ravel(), np.tile(np.arange(n), num_indices))), shape=(m, n)).toarray()
print(f'sparse.csr used {time.time() - tic:.5f}s')
tic = time.time()
reslst = []
for colid in range(n):
rescol = sparse.csr_matrix((values[:, colid], (indices[:, colid], np.zeros(num_indices, dtype=int))), shape=(m, 1)).toarray()
reslst.append(rescol)
res5 = np.hstack(reslst)
print(f'sparse.csr with loop used {time.time() - tic:.5f}s')
结果都是一样的:
assert np.isclose(res2, res1).all()
assert np.isclose(res3, res1).all()
assert np.isclose(res4, res1).all()
assert np.isclose(res5, res1).all()
然而,速度很奇怪,而且不令人满意:
for loop used 1.69889s
np.add.at used 0.17287s
np.add.at with loop used 0.47767s
sparse.csr used 0.16847s
sparse.csr with loop used 0.09845s
我的基本问题是:
- Why is
np.add.at
这么慢 - 甚至比scipy.sparse
?我认为 numpy 会受益匪浅,尤其是在多维情况下。
- For
scipy.sparse
,为什么多维比for循环还慢?没有使用并发吗? (为什么对于 numpy 来说,多维更快?)
- 总的来说,有没有更快的方法来实现我想要的?