Announcements

Wanted: an interval, or range, to cover the true parameter

Derivation: recall the t-distribution

How to compute the constant \(c\)

qt(0.025, 9) # df = n - 1 = 9
## [1] -2.262157
qt(0.975, 99)
## [1] 1.984217

Putting it all together

box <- c(1,1,1,5)
n <- 50
X <- sample(box, n, replace = TRUE)
Xbar <- mean(X)
SE <- sd(X)/sqrt(n)
c <- qt(0.975, n-1)
c(Xbar - c*SE, Xbar + c*SE)
## [1] 1.404309 2.355691
# Does the interval cover the true value?
mean(box)
## [1] 2

Let R calculate the interval for you

# Compare to our manually calculated interval
c(Xbar - c*SE, Xbar + c*SE)
## [1] 1.404309 2.355691
t.test(X)$conf.int
## [1] 1.404309 2.355691
## attr(,"conf.level")
## [1] 0.95
t.test(X, conf.level = 0.99)$conf.int
## [1] 1.245623 2.514377
## attr(,"conf.level")
## [1] 0.99
experiment <- function() {
  X <- sample(box, n, replace = TRUE)
  Xbar <- mean(X)
  SE <- sd(X)/sqrt(n)
  c(Xbar - c*SE, Xbar + c*SE)
}
CIs <- data.frame(t(replicate(20, experiment())))
names(CIs) <- c("Lower", "Upper")
CIs$SampleNumber <- 1:20
ggplot() + 
  geom_errorbar(data = CIs, aes(x = SampleNumber, ymin = Lower, ymax = Upper)) +
  geom_hline(yintercept = 2) + coord_flip() + theme_tufte()

Interpretation

Practical considerations