它们可能不属于“简单框架”,因为它们是需要安装的第三方模块,但我经常使用两个框架:
-
simple_benchmark https://github.com/MSeifert04/simple_benchmark(我是该包的作者)
- perfplot https://github.com/nschloe/perfplot
例如simple_benchmark
库允许装饰函数来进行基准测试:
from simple_benchmark import BenchmarkBuilder
b = BenchmarkBuilder()
import pandas as pd
import numpy as np
from numba import njit
@b.add_function()
def sum_pd(df):
return df.groupby('Group').Value.sum()
@b.add_function()
def sum_fc(df):
f, u = pd.factorize(df.Group.values)
v = df.Value.values
return pd.Series(np.bincount(f, weights=v).astype(int), pd.Index(u, name='Group'), name='Value').sort_index()
@njit
def wbcnt(b, w, k):
bins = np.arange(k)
bins = bins * 0
for i in range(len(b)):
bins[b[i]] += w[i]
return bins
@b.add_function()
def sum_nb(df):
b, u = pd.factorize(df.Group.values)
w = df.Value.values
bins = wbcnt(b, w, u.size)
return pd.Series(bins, pd.Index(u, name='Group'), name='Value').sort_index()
还修饰一个生成基准值的函数:
from string import ascii_uppercase
def creator(n): # taken from another answer here
letters = list(ascii_uppercase)
np.random.seed([3,1415])
df = pd.DataFrame(dict(
Group=np.random.choice(letters, n),
Value=np.random.randint(100, size=n)
))
return df
@b.add_arguments('Rows in DataFrame')
def argument_provider():
for exponent in range(4, 22):
size = 2**exponent
yield size, creator(size)
然后运行基准测试所需要做的就是:
r = b.run()
之后,您可以检查结果作为绘图(您需要matplotlib
为此的库):
r.plot()
如果函数在运行时非常相似,则百分比差异而不是绝对数字可能更重要:
r.plot_difference_percentage(relative_to=sum_nb)
或者获取基准时间为DataFrame
(这需要pandas
)
r.to_pandas_dataframe()
sum_pd sum_fc sum_nb
16 0.000796 0.000515 0.000502
32 0.000702 0.000453 0.000454
64 0.000702 0.000454 0.000456
128 0.000711 0.000456 0.000458
256 0.000714 0.000461 0.000462
512 0.000728 0.000471 0.000473
1024 0.000746 0.000512 0.000513
2048 0.000825 0.000515 0.000514
4096 0.000902 0.000609 0.000640
8192 0.001056 0.000731 0.000755
16384 0.001381 0.001012 0.000936
32768 0.001885 0.001465 0.001328
65536 0.003404 0.002957 0.002585
131072 0.008076 0.005668 0.005159
262144 0.015532 0.011059 0.010988
524288 0.032517 0.023336 0.018608
1048576 0.055144 0.040367 0.035487
2097152 0.112333 0.080407 0.072154
如果您不喜欢装饰器,您也可以在一次调用中设置所有内容(在这种情况下,您不需要BenchmarkBuilder
和add_function
/add_arguments
装饰器):
from simple_benchmark import benchmark
r = benchmark([sum_pd, sum_fc, sum_nb], {2**i: creator(2**i) for i in range(4, 22)}, "Rows in DataFrame")
Here perfplot
提供了非常相似的界面(和结果):
import perfplot
r = perfplot.bench(
setup=creator,
kernels=[sum_pd, sum_fc, sum_nb],
n_range=[2**k for k in range(4, 22)],
xlabel='Rows in DataFrame',
)
import matplotlib.pyplot as plt
plt.loglog()
r.plot()