我正在对南太平洋岛屿泻湖中宽吻海豚的分布进行建模。我想使用肥皂膜平滑器来模拟海豚在二维表面(经度 x 纬度)上存在的概率,考虑到陆地边界(显然海豚不能在陆地上行走)。
我想知道如何将我的研究区域(陆地和近海水域)的边界固定为等于零的条件,因为我预计在陆地上以及最近海处找到海豚的概率为零我研究区域的水域(这种海豚只在泻湖浅水区发现)。到目前为止,我已经测试了下面将描述的几种方法,但是我的模型预测的地图并不符合我对如何处理边界条件的期望。
这是数据集映射后的样子。海豚位置以蓝色显示,而缺席位置以粉色显示。陆地以黑色显示,泻湖周围的珊瑚礁以灰色显示。
第 1 步:创建边界和结对象
我创建了一个名为的边界多边形soap_bnd,将其转换为名为的列表列表bound并用结填满它肥皂结。边界包括由研究区域发现的两个岛屿形成的两个内部“洞”。以下代码的灵感来自:Gavin Simpson 的优秀博文https://www.fromthebottomoftheheap.net/2016/03/27/soap-film-smoothers/ https://www.fromthebottomoftheheap.net/2016/03/27/soap-film-smoothers/ and 我使用该功能自动紧缩由大卫·L·米勒 (David L Miller) 创建(参见https://github.com/dill/soap_checker https://github.com/dill/soap_checker).
位置位于投影 UTM 坐标系中(因此坐标列称为 utmx 和 utmy)
## boundary of the soap film (one polygon with two inner loops)
# soap_bnd is initially a SpatialPolygonDataFrame
crds <- tidy(soap_bnd)
crds <- crds %>% dplyr::select(long, lat, piece)
names(crds) <- c("x", "y", "piece")
bound <- split(crds, crds$piece)
bound <-lapply(bound,`[`, c(1, 2))
nr <- seq(1, 3)
bound <-lapply(nr, function(n) as.list.data.frame(bound[[n]]))
## bound is a list of 3 lists (because 1 large polygon and 2 loops), each including 3 elements (utmx, utmy and f)
## an element f is set to a vector of zeros to set the boundary condition
bound <- lapply(nr, function(n) bound[[n]] <- c(bound[[n]], list(f = rep(0, length(bound[[n]]$x)))))
## created a grid of knots within the boundary
soap_knots <- make.soapgrid(bound[[1]][c("x", "y")], 20)
# removing the knots that overlap with the inner loops (the islands)
x <- soap_knots$x
y <- soap_knots$y
soap_knots <- soap_knots[!inSide(x = x, y = y, bnd = bound[[2]]) &
!inSide(x = x, y = y, bnd = bound[[3]]),]
# just changing names so everything matches
bound <- llply(bound, function(b){names(b)[1:2] <- c("utmx", "utmy"); return(b)})
names(soap_knots) <- c("utmx", "utmy")
# remove points too close to the boundary
crunch_ind <- autocruncher(bound, soap_knots, k = 20, xname = "utmx", yname = "utmy")
soap_knots <- soap_knots[-crunch_ind, ]
## plotting the final product
plot(soap_bnd)
points(soap_knots)
第2步:运行游戏
GAM 模型是二项式的,因为我正在对海豚位置 (1) 与其他不存在海豚的位置 (0) 的分布进行建模。
m1.1 <- mgcv::gam(response ~
s(utmx, utmy, bs = "so", xt = list(bnd = bound)),
knots = soap_knots,
family = binomial, method = "REML",
data = target_df)
# predict the probability of presence using a raster stack called envLS_stack (which includes other environmental variables which are not used here)
pred_ras <- raster::predict(envLS_stack[[c("utmx", "utmy")]], m1.1, type = "response")
# plot the predictions - included between zero (low probability of presence) and one (high probability of presence)
plot(pred_ras, col = rev(brewer.pal(11, "Spectral")), axes = F)
测试和问题
这是海豚出现的概率(陆地和近海水域为白色)。
我不明白为什么边界上的概率结果是最高的,尽管我设置了f边界为零。
我尝试了另一种方法,删除了f我的元素bound列表。我使用以下代码重新运行 GAM:
bound_nof <- llply(bound, function(d){list(utmx = d$utmx, utmy = d$utmy)})
# I tried this model with k = 20 or k = 60
m1.2 <- mgcv::gam(response ~
s(utmx, utmy, k = 20, bs = "so", xt = list(bnd = bound_nof)),
knots = soap_knots,
family = binomial, method = "REML",
data = target_df)
这是我分别使用 k=20 和 k=60 时得到的结果:
我不明白这是怎么回事?在第一张地图(k = 20)中,模型似乎强烈地捕捉到了在研究区域最西端进行的单个海豚观察,并在那里产生了一大片高概率的存在......从统计学上来说,我确实如此不认为这种模式是相关的。第二张地图(k=60)更令人费解……
我想这种模式可能是由于我的研究区域西部和南部缺乏数据造成的。是否有必要减小产生皂膜的有限区域的尺寸?缺点是这会阻止模型进一步外推到我感兴趣的更大区域。