Outline

Review

Equations for a line

range <- data.frame(x = c(-2,0), y = c(0, 1))
ggplot(range, aes(x, y)) + ylim(c(-3,3)) + xlim(c(-3,3)) +
  geom_point(size = 2) +
  geom_abline(slope = 1/2, intercept = 1, color = "blue") +
  geom_abline(slope = -1, intercept = -1, linetype = 2) +
  theme_minimal()

Coefficient estimation

The coefficients \(\beta_0, \beta_1\) and the errors \(e_i\) are unknown

  • Everything we’ve learned so far about estimation applies here!
  • Goal: use the data to calculate sample estimates of coefficients: \(\hat \beta_0\) and \(\hat \beta_1\)
  • Previously we saw some regression lines calculated for us by the lm function in R
  • How does that work? How can we do it ourselves?
gm2007 <- filter(gapminder, year == "2007")
ggplot(gm2007, aes(gdpPercap, lifeExp)) +
  geom_point() + theme_minimal() +
  ggtitle("How can we compute the regression line?")

Slope: calculating from sample correlations and standard deviations

  • Last time we learned that the slope \(\beta_1\) has the same sign as the correlation between \(x\) and \(y\) (both are positive or both are negative)
  • Let \(r = \text{cor}(x, y)\), and \(s_x\) is the sample standard deviation of \(x\), likewise for \(s_y\)
  • The exact relationship is this: \[ \hat \beta_1 = r \cdot \frac{s_y}{s_x} \]
  • This is an important relationship to remember
  • We’ll come back to it when we think about interpretation

Intercept: calculating from sample means

  • Now that we know the slope, if we knew a point on the line then we could use the slope + point equation for a line
  • Let \(\bar x\) and \(\bar y\) be the sample means of the \(x\) and \(y\) variables
  • The estimated regression line always passes through one interesting point: the mean of the data \((\bar x, \bar y)\)
  • So if \(\hat \beta_0\) and \(\hat \beta_1\) are the sample linear regression coefficients, we know \[ \bar y = \hat \beta_0 + \hat \beta_1 \bar x \]
  • We can use this to calculate \(\hat \beta_0\) if we already know \(\hat \beta_1\) and the means: \[ \hat \beta_0 = \bar y - \hat \beta_1 \bar x \]

Interpretation