我知道存在this https://stackoverflow.com/questions/4033821/using-a-smoother-with-the-l-method-to-determine-the-number-of-k-means-clusters, and this https://stackoverflow.com/questions/2018178/finding-the-best-trade-off-point-on-a-curve关于这个话题。不过,这次我想最终确定Python 中的实际实现。
我唯一的问题是肘点似乎随着代码的不同实例而变化。观察这篇文章中显示的两个图。虽然它们在视觉上看起来相似,但肘点的值发生了显着变化。两条曲线均由 20 次不同运行的平均值生成。即使这样,肘点的值也发生了显着的变化。我可以采取哪些预防措施来确保该值落在某个范围内?
我的尝试如下所示:
def elbowPoint(points):
secondDerivative = collections.defaultdict(lambda:0)
for i in range(1, len(points) - 1):
secondDerivative[i] = points[i+1] + points[i-1] - 2*points[i]
max_index = secondDerivative.values().index(max(secondDerivative.values()))
elbow_point = max_index + 1
return elbow_point
points = [0.80881476685027154, 0.79457906121371058, 0.78071124401504677, 0.77110686192601441, 0.76062373158581287, 0.75174963969985187, 0.74356408965979193, 0.73577573557299236, 0.72782434749305047, 0.71952590556748364, 0.71417942487824781, 0.7076502559300516, 0.70089375208028415, 0.69393584640497064, 0.68550490458450741, 0.68494440529025913, 0.67920157634796108, 0.67280267176628761]
max_point = elbowPoint(points)