我正在使用 R 编程语言。假设我有以下两个表:
table_1 = data.frame(id = c("123", "123", "125", "125"), id2 = c("11", "12", "14", "13"),
date_1 = c("2010-01-31","2010-01-31", "2015-01-31", "2018-01-31" ))
table_1$id = as.factor(table_1$id)
table_1$id2 = as.factor(table_1$id2)
table_1$date_1 = as.factor(table_1$date_1)
table_2 = data.frame(id = c("123", "123", "125", "125"), id2 = c("111", "112", "14", "113"),
date_2 = c("2009-01-31","2010-01-31", "2010-01-31", "2010-01-31" ),
date_3 = c("2011-01-31","2010-01-31", "2020-01-31", "2020-01-31" ))
table_2$id = as.factor(table_2$id)
table_2$id2 = as.factor(table_2$id2)
table_2$date_2 = as.factor(table_2$date_2)
table_2$date_3 = as.factor(table_2$date_3)
如果满足以下两个条件中的任何一个(即,我想执行“连接”(任何连接都可以,我只是想学习一般如何执行此操作)如果 Condition_1 = TRUE 或 Condition_2 = TRUE,则“加入”)
条件_1
- 如果表_1$id = 表_2$id
AND
- if table_1$date BETWEEN(table_2$date_2,table_2$date_3)
条件_2
- 如果表_1$id2 = 表_2$id2
AND
- if table_1$date BETWEEN(table_2$date_2,table_2$date_3)
我已经尝试过的:我知道如何单独执行这两个连接,例如:
library(sqldf)
#Condition_1
final = sqldf("select a.*, b.*
from table_1 a left join table_2 b
on a.id = b.id and
a.date_1 between
b.date_2 and
b.date_3")
#Condition_2
final_2 = sqldf("select a.*, b.*
from table_1 a left join table_2 b
on a.id2 = b.id2 and
a.date_1 between
b.date_2 and
b.date_3")
然后我可以将这些文件(“final_3”)“绑定”在一起(并删除完全重复的行):
final_3 = rbind(final, final_2)
final_3 = final_3[!duplicated(final_3[c(1,2,3,4,5,6,7)]),]
我的问题:有没有办法在一个步骤中将这两个表合并在一起,而不是两个单独的步骤?这可以使用 Base R 或 DPLYR 来完成吗?
Thanks!