将阴影标准误差曲线添加到ggplot2中的geom_密度

2023-12-29

我想添加阴影标准误差曲线geom_density using ggplot2。我的代码如下所示：

data.plot <- data.frame(x = c(rnorm(100, mean = 0, sd = 5),
                                    rnorm(100, mean = 1, sd =2 )), 
                             g = factor(c(rep(1, 100),
                                        rep(2,100))))

ggplot(data.plot, aes(x, linetype = g)) + geom_density()

我找不到这样做的教程或示例。谢谢。

最好的解决方案是引导，如评论中所述。我会用经典的iris数据，关注密度萼片长度对于每个Species.

生成样本

library(dplyr)
library(broom)
library(tidyr)
library(ggplot2)


data_frame(bs = 1:1000) %>% group_by(bs) %>% 
  mutate(data = list(iris %>% group_by(Species) %>% 
                       sample_frac(size = 1, replace = T)))

# A tibble: 1,000 x 2
# Groups:   bs [1,000]
      bs               data
   <int>             <list>
 1     1 <tibble [150 x 5]>
 2     2 <tibble [150 x 5]>
 3     3 <tibble [150 x 5]>
 4     4 <tibble [150 x 5]>
 5     5 <tibble [150 x 5]>
 6     6 <tibble [150 x 5]>
 7     7 <tibble [150 x 5]>
 8     8 <tibble [150 x 5]>
 9     9 <tibble [150 x 5]>
10    10 <tibble [150 x 5]>
# ... with 990 more rows

因此，我只是对原始数据进行了 1000 次引导复制，从每组中取出与其原始样本大小相同的行数，并进行替换。现在我必须unnest访问嵌套中的数据data column.

计算样本内密度

densities.within <-
data_frame(bs = 1:1000) %>% group_by(bs) %>% 
  mutate(data = list(iris %>% group_by(Species) %>% 
                       sample_frac(size = 1, replace = T))) %>% 
  unnest() %>% 
  group_by(bs, Species) %>% 
  do(tidy(density(.$Sepal.Length, 
                  from = min(iris$Sepal.Length), 
                  to = max(iris$Sepal.Length), 
                  n = 128)))

# A tibble: 384,000 x 4
# Groups:   bs, Species [30,000]
      bs Species        x         y
   <int>  <fctr>    <dbl>     <dbl>
 1     1  setosa 4.300000 0.2395786
 2     1  setosa 4.328346 0.2821128
 3     1  setosa 4.356693 0.3235939
 4     1  setosa 4.385039 0.3632449
 5     1  setosa 4.413386 0.4010378
 6     1  setosa 4.441732 0.4375189
 7     1  setosa 4.470079 0.4734727
 8     1  setosa 4.498425 0.5095333
 9     1  setosa 4.526772 0.5459280
10     1  setosa 4.555118 0.5824587
# ... with 383,990 more rows

因此，我们将数据扩展为长形式，然后计算每个组的密度萼片长度 within Species within bs。我们必须提供手册from = and to =因为每个引导程序中的最小值和最大值可能不同（并设置较低的n =比默认的 512 只是为了节省时间）。为了简化S3: density生成的对象，我们使用broom::tidy。这是计算密集型步骤，因此我们将此对象保存为内密度.

将密度汇总为分位数

这会产生名为x and y，但我们将重命名它们以匹配我们的数据。然后我们会计算出：对于每个计算出的可能的密度萼片长度，CI 的下限、中位数和 CI 的上限是多少？我们将使用quantile以获得计算密度的这些特定值。

densities.qtiles <-
densities.within %>%
  rename(Sepal.Length = x, dens = y) %>%
  ungroup() %>%
  group_by(Species, Sepal.Length) %>% 
  summarise(q05 = quantile(dens, 0.025),
            q50 = quantile(dens, 0.5),
            q95 = quantile(dens, 0.975))

# A tibble: 384 x 5
# Groups:   Species [?]
   Species Sepal.Length        q05       q50       q95
    <fctr>        <dbl>      <dbl>     <dbl>     <dbl>
 1  setosa     4.300000 0.05730022 0.2355335 0.4426299
 2  setosa     4.328346 0.08177850 0.2734463 0.4970097
 3  setosa     4.356693 0.09863062 0.3114570 0.5505578
 4  setosa     4.385039 0.12459033 0.3430645 0.5884523
 5  setosa     4.413386 0.15049699 0.3705389 0.6207344
 6  setosa     4.441732 0.17494889 0.4006335 0.6418923
 7  setosa     4.470079 0.19836510 0.4258006 0.6655006
 8  setosa     4.498425 0.21106857 0.4555755 0.6971370
 9  setosa     4.526772 0.23399070 0.4813130 0.7244413
10  setosa     4.555118 0.24863090 0.5108057 0.7708114
# ... with 374 more rows

可视化和比较

ggplot(densities.qtiles, aes(Sepal.Length, q50)) +
  facet_wrap(~Species, nrow = 2) +
  geom_histogram(data = iris, 
                 aes(Sepal.Length, ..density..), 
                 colour = "black", fill = "white", 
                 binwidth = 0.25, boundary = 0) +
  geom_ribbon(aes(ymin = q05, ymax = q95), alpha = 0.5, fill = "grey50") +
  stat_density(data = iris, 
               aes(Sepal.Length, ..density.., color = "raw density"), 
               size = 2, geom = "line") +
  geom_line(size = 1.5, aes(color = "bootstrapped")) + 
  scale_color_manual(values = c("red", "black")) +
  labs(y = "density") +
  theme(legend.position = c(0.5,0),
        legend.justification = c(0,0))

我还包括原始数据的直方图和密度层以进行比较。您可以看到，1000 个引导样本的中值密度和原始密度非常接近。

本文内容由网友自发贡献，版权归原作者所有，本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容，请联系:hwhale#tublm.com(使用前将#替换为@)