您可以从兴趣向量创建一个正则表达式模式,并将其应用到数据框的每一列,除了patient
身份、使用rowSums
检查行中是否有任何 var 与模式匹配:
library(dplyr)
pattern = paste("^(", paste0(dx, collapse = "|"), ")", sep = "")
pattern
# [1] "^(866|867)"
filter(observations, rowSums(sapply(observations[-1], grepl, pattern = pattern)) != 0)
# A tibble: 2 × 4
# patient var1 var2 var3
# <chr> <chr> <chr> <chr>
#1 a 8661 8651 2430
#2 b 865 8674 3456
另一种选择是使用Reduce
with lapply
:
filter(observations, Reduce("|", lapply(observations[-1], grepl, pattern = pattern)))
# A tibble: 2 × 4
# patient var1 var2 var3
# <chr> <chr> <chr> <chr>
#1 a 8661 8651 2430
#2 b 865 8674 3456
当您有两个以上的模式并且不同的模式具有不同的字符长度时,例如,如果您有dx
as dx<-c("866","867", "9089")
:
dx<-c("866","867", "9089")
pattern = paste("^(", paste0(dx, collapse = "|"), ")", sep = "")
pattern
# [1] "^(866|867|9089)"
filter(observations, Reduce("|", lapply(observations[-1], grepl, pattern = pattern)))
# A tibble: 3 × 4
# patient var1 var2 var3
# <chr> <chr> <chr> <chr>
#1 a 8661 8651 2430
#2 b 865 8674 3456
#3 c 8651 2866 9089
Check this and this有关多个的更多信息,请堆栈答案or正则表达式中的条件。