所以我有一些数据的结构类似于以下内容:
| Works | DoesNotWork |
-----------------------
Unmarried| 130 | 235 |
Married | 10 | 95 |
我正在尝试使用逻辑回归来预测Work Status
来自Marriage Status
,但是我认为我不明白如何在 R 中进行操作。例如,如果我的数据如下所示:
MarriageStatus | WorkStatus|
-----------------------------
Married | No |
Married | No |
Married | Yes |
Unmarried | No |
Unmarried | Yes |
Unmarried | Yes |
我知道我可以执行以下操作:
log_model <- glm(WorkStatus ~ MarriageStatus, data=MarriageDF, family=binomial(logit))
当数据汇总时,我只是不明白该怎么做。我是否需要将数据扩展为非汇总形式并进行编码Married/Unmarried
as 0/1
并做同样的事情Working/Not Working
并将其编码为0/1
? .
仅给出第一个摘要 DF,我该如何编写逻辑回归glm
功能?像这样的东西吗?
log_summary_model <- glm(Works ~ DoesNotWork, data=summaryDF, family=binomial(logit))
但这没有意义,因为我正在分割响应因变量?
我不确定我是否让这个问题变得过于复杂,任何帮助将不胜感激,谢谢!
您需要将列联表扩展为数据框,然后可以使用频率计数作为权重变量来计算 Logit 模型:
mod <- glm(works ~ marriage, df, family = binomial, weights = freq)
summary(mod)
Call:
glm(formula = works ~ marriage, family = binomial, data = df,
weights = freq)
Deviance Residuals:
1 2 3 4
16.383 6.858 -14.386 -4.361
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.5921 0.1093 -5.416 6.08e-08 ***
marriage -1.6592 0.3500 -4.741 2.12e-06 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 572.51 on 3 degrees of freedom
Residual deviance: 541.40 on 2 degrees of freedom
AIC: 545.4
Number of Fisher Scoring iterations: 5
Data:
df <- read.table(text = "works marriage freq
1 0 130
1 1 10
0 0 235
0 1 95", header = TRUE)
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)