我喜欢多么简单dplyr
and tidyr
已经成功创建一个包含多个预测变量和结果变量的汇总表。让我困惑的一件事是在输出表中保留/定义预测变量的顺序及其因子水平的最后一步。
我想出了一种解决方案(如下),其中涉及使用mutate
手动创建一个因子变量,将预测变量和预测变量值(例如“gender_female”)与所需输出顺序的级别结合起来。但是如果变量很多的话我的解决方案就有点啰嗦了,不知道有没有更好的办法呢?
library(dplyr)
library(tidyr)
levels_eth <- c("Maori", "Pacific", "Asian", "Other", "European", "Unknown")
levels_gnd <- c("Female", "Male", "Unknown")
set.seed(1234)
dat <- data.frame(
gender = factor(sample(levels_gnd, 100, replace = TRUE), levels = levels_gnd),
ethnicity = factor(sample(levels_eth, 100, replace = TRUE), levels = levels_eth),
outcome1 = sample(c(TRUE, FALSE), 100, replace = TRUE),
outcome2 = sample(c(TRUE, FALSE), 100, replace = TRUE)
)
dat %>%
gather(key = outcome, value = outcome_value, contains("outcome")) %>%
gather(key = predictor, value = pred_value, gender, ethnicity) %>%
# Statement below creates variable for ordering output
mutate(
pred_ord = factor(interaction(predictor, addNA(pred_value), sep = "_"),
levels = c(paste("gender", levels(addNA(dat$gender)), sep = "_"),
paste("ethnicity", levels(addNA(dat$ethnicity)), sep = "_")))
) %>%
group_by(pred_ord, outcome) %>%
summarise(n = sum(outcome_value, na.rm = TRUE)) %>%
ungroup() %>%
spread(key = outcome, value = n) %>%
separate(pred_ord, c("Predictor", "Pred_value"))
Source: local data frame [9 x 4]
Predictor Pred_value outcome1 outcome2
(chr) (chr) (int) (int)
1 gender Female 25 27
2 gender Male 11 10
3 gender Unknown 12 15
4 ethnicity Maori 10 9
5 ethnicity Pacific 7 7
6 ethnicity Asian 6 12
7 ethnicity Other 10 9
8 ethnicity European 5 4
9 ethnicity Unknown 10 11
Warning message:
attributes are not identical across measure variables; they will be dropped
上表是正确的,因为预测器和预测器值都不是按字母顺序排列的。
EDIT
根据要求,这就是使用默认排序(按字母顺序)时生成的结果。这是有道理的,因为当这些因素组合在一起时,它们会转换为字符变量,并且所有属性都会被删除。
dat %>%
gather(key = outcome, value = outcome_value, contains("outcome")) %>%
gather(key = predictor, value = pred_value, gender, ethnicity) %>%
group_by(predictor, pred_value, outcome) %>%
summarise(n = sum(outcome_value, na.rm = TRUE)) %>%
spread(key = outcome, value = n)
Source: local data frame [9 x 4]
predictor pred_value outcome1 outcome2
(chr) (chr) (int) (int)
1 ethnicity Asian 6 12
2 ethnicity European 5 4
3 ethnicity Maori 10 9
4 ethnicity Other 10 9
5 ethnicity Pacific 7 7
6 ethnicity Unknown 10 11
7 gender Female 25 27
8 gender Male 11 10
9 gender Unknown 12 15
Warning message:
attributes are not identical across measure variables; they will be dropped