EDIT:
事实上,你可以构建M1
直接使用同样的方法:
import numpy as np
def M1_strided(a):
a = np.asarray(a)
n = len(a)
s, = a.strides
a0 = np.concatenate([np.zeros(len(a) - 1, a.dtype), a])
return np.lib.stride_tricks.as_strided(
a0, (n, n), (s, s), writeable=False)[:, ::-1]
print(M1_strided(np.array([10, 20, 30, 40])))
# [[10 0 0 0]
# [20 10 0 0]
# [30 20 10 0]
# [40 30 20 10]]
在这种情况下,速度优势甚至更好,因为您将呼叫保存到np.tril https://docs.scipy.org/doc/numpy/reference/generated/numpy.tril.html:
N = 100
a = np.square(np.arange(N))
%timeit np.tril(M2_simple(a))
# 792 µs ± 15.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit np.tril(M2_indexing(a))
# 259 µs ± 9.45 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit np.tril(M2_strided(a))
# 134 µs ± 1.68 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit M1_strided(a)
# 45.2 µs ± 583 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
您可以构建M2
矩阵更有效np.lib.stride_tricks.as_strided https://docs.scipy.org/doc/numpy/reference/generated/numpy.lib.stride_tricks.as_strided.html:
import numpy as np
from numpy.lib.stride_tricks import as_strided
def M2_strided(a):
a = np.asarray(a)
n = len(a)
s, = a.strides
return np.lib.stride_tricks.as_strided(
np.tile(a[::-1], 2), (n, n), (s, s), writeable=False)[::-1]
作为额外的好处,您将只使用原始数组两倍的内存(而不是平方大小)。您只需要小心不要写入这样创建的数组(如果您要调用,这应该不是问题np.tril https://docs.scipy.org/doc/numpy/reference/generated/numpy.tril.html稍后)-我补充说writeable=False
禁止写入操作。
与 IPython 的快速速度比较:
N = 100
a = np.square(np.arange(N))
%timeit M2_simple(a)
# 693 µs ± 17.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit M2_indexing(a)
# 163 µs ± 1.88 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit M2_strided(a)
# 38.3 µs ± 348 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)