数据集包含三个变量:id、性别和年级(因素)。
mydata <- data.frame(id=c(1,1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4,4), sex=c(1,1,1,1,1,0,0,0,0,0,0,0,0,1,1,1,1,1),
grade=c("a","b","c","d","e", "x","y","y","x", "q","q","q","q", "a", "a", "a", NA, "b"))
对于每个 ID,我需要查看我们有多少个唯一的成绩,然后创建一个新列(调用 N)来记录成绩频率。例如,对于 ID=1,我们有五个唯一的“等级”值,因此 N = 4;对于 ID=2,我们有两个唯一的“等级”值,因此 N = 2;对于 ID=4,我们有两个唯一的“等级”值(忽略 NA),因此 N = 2。
最终的数据集是
mydata <- data.frame(id=c(1,1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4,4), sex=c(1,1,1,1,1,0,0,0,0,0,0,0,0,1,1,1,1,1),
grade=c("a","b","c","d","e", "x","y","y","x", "q","q","q","q", "a", "a", "a", NA, "b"))
mydata$N <- c(5,5,5,5,5,2,2,2,2,1,1,1,1,2,2,2,2,2)