#### Outline

• Statistical vs practical significance
• How large of a sample is enough?

## Statistical significance and practical significance

• Suppose you read this headline: Diet X is associated with lower risk of cancer
• You check out the study, the null hypothesis is no assocation, the $$p$$-value is $$<0.00001$$
• Very significant result!
• But what if the risk reduction was, e.g., from 2.5% to 2.47% risk?
• The result is highly statistically significant, but not very practically significant

• Note: to make formulas simple, we assume variances are known and equal 1, hence we use normal distribution instead of t-distribution
• In practice, the ratio $$\mu/\sigma$$ is what matters. We’ll come back to this at the end

• Let’s be more mathematical: consider an example for differences between groups
• Suppose the true difference is $$\mu_1 - \mu_2 = 0.01$$
• If the sample size is very large, the test will reject the null hypothesis
• But is that really useful information? It depends on if $$0.01$$ is large enough to be of any practical importance

group1 <- rnorm(1000000, mean = 0.01)
group2 <- rnorm(1000000, mean = 0)
t.test(group1, group2)
##
##  Welch Two Sample t-test
##
## data:  group1 and group2
## t = 7.1219, df = 2e+06, p-value = 1.065e-12
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.007300852 0.012845106
## sample estimates:
##   mean of x   mean of y
## 0.011239436 0.001166458
range <- data.frame(x = c(-2,2))
ggplot(range, aes(x)) +
stat_function(fun = dnorm, args = list(mean = 0.01, sd = 1)) +
stat_function(fun = dnorm, args = list(mean = 0, sd = 1)) +
theme_tufte() +
ggtitle("Two significantly different distributions?")

• Below we consider an even smaller effect of $$\mu = 0.001$$
• Plot the probability of rejecting the null at 5% significance as a function of sample size $$n$$
c_null <- qnorm(.95)
mu <- 0.001
powern <- function(n) {
1 - pnorm(c_null - mu*sqrt(n))
}
range <- data.frame(n = 10^c(1:7))
ggplot(range, aes(n)) +
stat_function(fun = powern) + theme_tufte() +
ylab("Power") +
ggtitle("Power as a function of sample size, mu = 0.001")

• Hypothesis tests are still useful if you must make a decision, e.g. A/B testing, summarizing the conclusion of a scientific study, etc
• But beware: very large sample sizes might mean any test you do will be significant

• Just for fun, let’s also look at the “power function” for $$n = 100$$ and increasing true mean $$\mu$$:

powermu <- function(mu) 1 - pnorm(c_null - mu*10)
range <- data.frame(mu = seq(from = 0, to = 1, length.out = 100))
ggplot(range, aes(mu)) +
stat_function(fun = powermu) + theme_tufte() +
ylab("Power") +
ggtitle("Power as a function of true mean, n = 100")

• Here’s the power function for the two sided alternative
c_null <- qnorm(.975)
powermu <- function(mu) pnorm(-c_null - mu*10) + pnorm(c_null - mu*10, lower.tail = F)
range <- data.frame(mu = seq(from = -.5, to = .5, length.out = 100))
ggplot(range, aes(mu)) +
stat_function(fun = powermu) + theme_tufte() +
ylab("Power") +
ggtitle("Power as a function of true mean (two-sided), n = 100")