我的问题类似于dplyr:使用滚动时间窗口对数据进行分组和汇总/变异 https://stackoverflow.com/questions/36187931/dplyr-grouping-and-summarizing-mutating-data-with-rolling-time-windows我已经使用它作为参考,但还没有成功地操纵它来完成我需要做的事情。
我的数据看起来像这样:
a <- data.table("TYPE" = c("A", "A", "B", "B",
"C", "C", "C", "C",
"D", "D", "D", "D"),
"DATE" = c("4/20/2018 11:47",
"4/25/2018 7:21",
"4/15/2018 6:11",
"4/19/2018 4:22",
"4/15/2018 17:46",
"4/16/2018 11:59",
"4/20/2018 7:50",
"4/26/2018 2:55",
"4/27/2018 11:46",
"4/27/2018 13:03",
"4/20/2018 7:31",
"4/22/2018 9:45"),
"CLASS" = c(1, 2, 3, 4,
1, 2, 3, 4,
1, 2, 3, 4))
由此我首先通过以下方式订购了数据TYPE
然后通过DATE
并创建一个仅包含日期并忽略时间的列DATE
column:
a <- a[order(TYPE, DATE), ]
a[, YMD := date(a$DATE)]
现在我正在尝试使用TYPE
列和YMD
列以产生新列。这是我试图满足的标准:
1)维护原始数据集中的所有列
2)创建一个名为 say 的新列EVENTS
3)对于每个TYPE
如果它发生超过n
30天内多次然后放Y
in the EVENTS
每个列TYPE
and YMD
这使得该小组获得资格并且N
否则。 (注意这是为了n
独特的日期,所以它必须有n
30 天内的唯一日期才有资格)。
这将是预期的输出,如果n = 4
:
这是我拥有的最接近的示例,但它没有考虑唯一的日期,也没有保留表中的所有列:
a %>% mutate(DATE = as.POSIXct(DATE, format = "%m/%d/%Y %H:%M")) %>%
inner_join(.,., by="TYPE") %>%
group_by(TYPE, DATE.x) %>%
summarise(FLAG = as.integer(sum(abs((DATE.x-DATE.y)/(24*60*60))<=30)>=4))
任何建议表示赞赏。
Update
下面的两个答案都适用于我的原始示例数据,但是,如果我们再添加一些实例D
然后他们都标记所有D
as 1
而不是标记前 4 个实例0
以及最后 4 个实例1
这就是“滚动窗口”发挥作用的地方。
更新数据集:
a <- data.table("TYPE" = c("A", "A", "B", "B",
"C", "C", "C", "C",
"D", "D", "D", "D",
"D", "D", "D", "D"),
"DATE" = c("4/20/2018 11:47",
"4/25/2018 7:21",
"4/15/2018 6:11",
"4/19/2018 4:22",
"4/15/2018 17:46",
"4/16/2018 11:59",
"4/20/2018 7:50",
"4/26/2018 2:55",
"4/27/2018 11:46",
"4/27/2018 13:03",
"4/20/2018 7:31",
"4/22/2018 9:45",
"6/01/2018 9:07",
"6/03/2018 12:34",
"6/07/2018 1:57",
"6/10/2018 2:22"),
"CLASS" = c(1, 2, 3, 4,
1, 2, 3, 4,
1, 2, 3, 4,
1, 2, 3, 4))
新更新的预期输出将是: