May 10, 2019
We almost always work with variables that are either …
- quantitative
- categorical with only two levels
In relating two variables, there are four possible situations
Quantitative | Categorical | |
---|---|---|
Quant. | slope of regression line or correlation coefficient | difference in two means |
…………. | …………………………………… | ………………………………………. |
Categ. | ?????? | difference in two proportions |
Presentation in terms of algebra
Quantitative | Categorical | |
---|---|---|
Quant. | \(b_1 = \frac{n\sum xy - \sum x \cdot \sum y}{n \sum x^2 - (\sum x)^2}\) | \(t = \frac{\bar{x}_1 - \bar{x}_2 - (\mu_1 - \mu_2)}{\sqrt{\frac{s_p^2}{n_1} + \frac{s_p^2}{n_2}}}\) |
…………. | …………………………………… | ………………………………………. |
Categ. | ?????? | \(z = \frac{(\hat{p_1} - \hat{p_2}) - (p_1 - p_2)}{\sqrt{\frac{\bar{p}\bar{q}}{n_1} + \frac{\bar{p}\bar{q}}{n_2}}}\) |
… and associated distributions: t and z
Data beats formulas
Quantitative | Categorical | |
---|---|---|
Quant. | ||
…………. | …………………………………… | ………………………………………. |
Categ. |
Raw and model values
The error bars show the size of the standard deviation.
The formal inference procedure
- Measure the effect size. Call it \(\beta\):
- slope for Quant vs Quant or Cat vs Quant
- difference for Quant vs Cat or Cat vs Cat
- Take the ratio of the model-value standard deviation to the raw-value standard deviation. Call this ratio \(R\).
- Compute the ratio of “explained” to “unexplained”: \[F = (n-1) \frac{R^2}{1 - R^2}\]
Eyeballing from the four models (which had n = 200)…
Quantitative | Categorical | |
---|---|---|
\(\beta = \frac{1\ cm}{3\ years}\) | \(\beta \approx 8\ cm\) | |
Quantitative | \(R \approx 0.5\) | \(R \approx 1/3\) |
\(F = 24.8\) | \(F \approx 66\) | |
\(\beta = 0.007\) per year | \(\beta \approx 0.2\) | |
Categorical | \(R \approx 0.12\) | \(R \approx 0.25\) |
\(F \approx 8.3\) | \(F \approx 13.3\) |
Confidence interval on \(\beta\) is always \[CI_\beta = \beta (1 \pm 2 / \sqrt{F})\]
p-value … Look up in this graph.
Freebies:
- correlation coefficient \(r = \sqrt{R^2}\), where the \(\pm\) branch is based on the slope.
- Prefer t? It’s \(t = \sqrt{F}\). But F is more general than t.
comments powered by Disqus