All,
我正在寻找一种可靠的、无监督的方法来检测相对较短的向量中的变化点。考虑以下两个示例:
v1 = c(0.299584,0.314446,0.357783,0.388896,0.410417,0.427182,0.450383,0.466671,0.474884,0.474749,0.493566,0.500374,0.522482,0.529851,0.538387,0.577901,0.610939,0.639383,0.662433,0.692656,0.720543,0.738255,0.748055,0.7591,0.770595,0.781811,0.794479,0.794588,0.789448,0.77667,0.765406,0.75152,0.740408,0.726898,0.720766,0.709445,0.69896,0.687508,0.673382,0.65795,0.639214,0.620445,0.590047,0.561773,0.526807,0.486848,0.439681,0.387545,0.313369,0.282872,0.279908,0.271836,0.269088,0.262727,0.259782)
v2 = c(0.081309,0.206263,0.429069,0.511859,0.565194,0.578792,0.56919,0.51985,0.432563,0.193907,0.0771,0.086603,0.18303,0.177608,0.169706,0.260917,0.292062,0.2979,0.263249,0.270576,0.250422,0.25219,0.182878,0.080623,0.079443,0.088944,0.087623,0.126403,0.155563,0.273942,0.312054,0.370195,0.357087,0.336452,0.300574,0.243105,0.243105,0.25593,0.227401,0.218047,0.15857,0.157727,0.139801,0.125742,0.129142,0.142166,0.142166,0.136748,0.107755,0.064377,0.072801,0.060093,0.103441,0.111704,0.124544)
如果你看
plot(v1,type='l')
and
plot(v2,type='l')
您可以看到,对于 v1,我希望检测索引 = 28 附近的变化,对于 v2,我希望检测索引值 8、11、18、25、32 和 51 处的变化。到目前为止,我我已经尝试过贝叶斯变化点算法,该算法在识别可能出现拐点的位置(低后验概率区域)方面效果不错,但仍然迫使我依靠目视检查来进行最终确定:
install.packages('bcp')
library(bcp)
test = bcp(v1,w0=0.2,p0=0.01)
plot(v1,type='l')
par(new=TRUE)
plot(test$posterior.prob,type='l',col=2)
test = bcp(v2,w0=0.2,p0=0.01)
plot(v2,type='l')
par(new=TRUE)
plot(test$posterior.prob,type='l',col=2)
有没有一种方法可以自动无监督地选择此类数据中多个变化点的估计?也许我只是徒劳地寻找人类直觉的替代品:P我还查看了changepoint包,但它似乎不是为此类数据设计的。
谢谢,
亚伦