- Study guide, homework solutions
- Today: variance, Chebyshev’s inequality, random samples

- Suppose \(X\) has distribution \(p_X(x)\) with domain \(D\) (the set of possible values), and \(f\) is some function
- \(f(X)\) is a new random variable. What is its expectation? \[E[f(X)] = \sum_{x \in D} f(x) p_X(x)\]
- e.g. variance calculation \(f(x) = (x-\mu)^2\) (see Bernoulli example below)

- Measure of dispersion (spread) of a random variable
- Variance: \(\text{Var}(X) = E[(X - \mu)^2]\)
Expected

*squared distance*of \(X\) from \(\mu\)- e.g. For \(X \sim\) Ber(\(p\)), \(\mu = p\), and \((X - p)^2\) is a random variable which equals \((1-p)^2\) with probability \(p\) and equals \((0-p)^2\) with probability \(1-p\). To find \(\text{Var}(X)\) we compute the expected value of \((X-p)^2\), like this: \[ \text{Var}(X) = (0-p)^2 \cdot (1-p) + (1-p)^2 \cdot p = [p^2 + (1-p)p](1-p) = p(1-p) \]
- Notation: sometimes use \(\sigma^2\) for \(\text{Var}(X)\) (if it’s clear from the context)
- Does this make sense as a measure of the dispersion of a Bernoulli?
If \(p = 1/2\), then \(\sigma^2 = 1/4\). If \(p = 9/10\), then \(\sigma^2 = 9/100\) – \(p\) close to 1 makes the Bernoulli more “concentrated,” it has less variance

- If \(X\) and \(Y\) are
**independent**, then \(\text{Var}(X+Y) = \text{Var}(X) + \text{Var}(Y)\) - Unlike the case for expectation, where there was no requirement, we now need independence
- Without independence: there is a more complicated formula we won’t get into now
- Let’s look at some Binomial examples

```
df <- data.frame(x = c(25:55, 45:75),
y = c(dbinom(25:55, 80, .5), dbinom(45:75, 120, .5)),
n = c(rep("80", 31), rep("120", 31)))
ggplot(df, aes(x, y, fill = n)) +
geom_bar(stat = "identity", position = "identity", alpha = .4) +
theme_tufte() + ggtitle("Binomial distributions with p = 1/2")
```

```
df <- data.frame(x = c(-18:18, -18:18),
y = c(dbinom(22:58, 80, .5), dbinom(42:78, 120, .5)),
n = c(rep("80", 37), rep("120", 37)))
ggplot(df, aes(x, y, fill = n)) +
geom_bar(stat = "identity", position = "identity", alpha = .4) +
theme_tufte() + ggtitle("Binomial distributions with p = 1/2 (centered)")
```

- Looks like larger \(n\) leads to larger variance

- Let \(a > 0\) be any constant.
- Chebyshev’s inequality: \(P(|X - \mu| \geq a) \leq \sigma^2/a^2\)
- For example, let \(a = 2\sigma\), then \(P(|X - \mu| \geq 2\sigma) \leq 1/4\)
- Look familiar? (We’ll come back to the 68-95-99 rule again soon)
- Helps justify use of expectation and variance as summaries of the full probability distribution

- Suppose \(X \sim\) Bin(\(n,p\)), and we want to visualize the distribution function of \(X\)
- Since we know the formulas we could use those as before
- But if we can
*generate*many observed values of \(X\), we could also look at the histogram of those values. This will be a histogram of data rather than the “true” distribution function (much of statistics works by relating these two “worlds”) - Remember the Galton board?

```
df <- data.frame(x = 35:65, y = dbinom(35:65, 100, .5))
ggplot(df, aes(x, y)) + geom_bar(stat = "identity") + theme_tufte()
```

```
df <- data.frame(x = rbinom(500, 100, .5))
ggplot(df, aes(x)) + stat_count() + theme_tufte()
```

```
sampledX <- table(rbinom(500, 100, .5))/500
nbins <- length(sampledX)
df <- data.frame(x = c(35:65, as.integer(names(sampledX))),
y = c(dbinom(35:65, 100, .5), sampledX),
world = c(rep("model", 31), rep("data", nbins)))
ggplot(df, aes(x, y, fill = world)) +
geom_bar(stat = "identity", position = "identity", alpha = .4) + theme_tufte()
```

```
sampledX <- table(rbinom(5000, 100, .5))/5000
nbins <- length(sampledX)
df <- data.frame(x = c(35:65, as.integer(names(sampledX))),
y = c(dbinom(35:65, 100, .5), sampledX),
world = c(rep("model", 31), rep("data", nbins)))
ggplot(df, aes(x, y, fill = world)) +
geom_bar(stat = "identity", position = "identity", alpha = .4) + theme_tufte()
```