It's all regression

May 10, 2019

We almost always work with variables that are either …

  • quantitative
  • categorical with only two levels

In relating two variables, there are four possible situations

  Quantitative Categorical
 
Quant. slope of regression line or correlation coefficient difference in two means
…………. …………………………………… ……………………………………….
Categ. ?????? difference in two proportions

Presentation in terms of algebra

  Quantitative Categorical
 
Quant. b1=nxyxynx2(x)2 t=x¯1x¯2(μ1μ2)sp2n1+sp2n2
…………. …………………………………… ……………………………………….
Categ. ?????? z=(p1^p2^)(p1p2)p¯q¯n1+p¯q¯n2

… and associated distributions: t and z

Data beats formulas

  Quantitative Categorical
 
Quant.
…………. …………………………………… ……………………………………….
Categ.

Raw and model values

The error bars show the size of the standard deviation.

The formal inference procedure

  1. Measure the effect size. Call it β:
    • slope for Quant vs Quant or Cat vs Quant
    • difference for Quant vs Cat or Cat vs Cat
  2. Take the ratio of the model-value standard deviation to the raw-value standard deviation. Call this ratio R.

  1. Compute the ratio of “explained” to “unexplained”: F=(n1)R21R2

Eyeballing from the four models (which had n = 200)…

  Quantitative Categorical
  β=1 cm3 years β8 cm
Quantitative R0.5 R1/3
  F=24.8 F66
     
  β=0.007 per year β0.2
Categorical R0.12 R0.25
  F8.3 F13.3

 

  1. Confidence interval on β is always CIβ=β(1±2/F)

  2. p-value … Look up in this graph.

Freebies:

  • correlation coefficient r=R2, where the ± branch is based on the slope.
  • Prefer t? It’s t=F. But F is more general than t.