我再次努力理解mult
参数在执行连接更新时起作用。
我想做的是实现中定义的左连接lj.
出于性能原因,我想更新左表
“不平凡”的部分是,当左表和右表有一个共同的列时(不考虑连接列),我想使用右表中的第一个值来覆盖左表。
我想mult
会帮助我处理这个多重匹配问题,但我无法正确处理
library(data.table)
X <- data.table(x = c("a", "a", "b", "c", "d"), y = c(0, 1, 1, 2, 2), t = 0:4)
X
# x y t
# <char> <num> <int>
#1: a 0 0
#2: a 1 1
#3: b 1 2
#4: c 2 3
#5: d 2 4
Y <- data.table(xx = c("f", "b", "c", "c", "e", "a"), y = c(2, NA, 3, 4, 5, 6), u = 2:7)
Y
# xx y u
# <char> <num> <int>
#1: f 2 2
#2: b NA 3
#3: c 3 4
#4: c 4 5
#5: e 5 6
#6: a 6 7
# Expected result
# x y t
# <char> <num> <int>
#1: a 6 0 <= single match on xx == "a" so Y[xx == "a", y] is used
#2: a 6 1 <= single match on xx == "a" so Y[xx == "a", y] is used
#3: b NA 2 <= single match on xx == "b" so Y[xx == "b", y] is used
#4: c 3 3 <= mult match on xx == "c" so Y[xx == "c", y[1L]] is used
#5: d 2 4 <= no xx == "d" in Y so nothing changes
copy(X)[Y, y := i.y, by = .EACHI, on = c(x = "xx"), mult = "first"][]
# x y t
# <char> <num> <int>
#1: a 6 0
#2: a 1 1 <= a should always have the same value ie 6
#3: b NA 2
#4: c 4 3 <= y == 4 is not the first value of y in the Y table
#5: d 2 4
# Using mult = "all" is the closest I get from the right result
copy(X)[Y, y := i.y, by = .EACHI, on = c(x = "xx"), mult = "all"][]
# x y t
# <char> <num> <int>
#1: a 6 0
#2: a 6 1
#3: b NA 2
#4: c 4 3 <= y == 4 is not the first value of y in the Y table
#5: d 2 4
有人可以向我解释上面有什么问题吗?
我想我可以用Y[X, ...]
为了达到我想要的效果,问题是 X 非常大,并且使用时我得到的性能要差得多Y[X, ...]