Announcements

Motivating example

Albert Michelson, physicist, conducted experiments to measure the speed of light. Here is a dataset of 100 measurements:

morley$Speed
##   [1]  850  740  900 1070  930  850  950  980  980  880 1000  980  930  650
##  [15]  760  810 1000 1000  960  960  960  940  960  940  880  800  850  880
##  [29]  900  840  830  790  810  880  880  830  800  790  760  800  880  880
##  [43]  880  860  720  720  620  860  970  950  880  910  850  870  840  840
##  [57]  850  840  840  840  890  810  810  820  800  770  760  740  750  760
##  [71]  910  920  890  860  880  720  840  850  850  780  890  840  780  810
##  [85]  760  810  790  810  820  850  870  870  810  740  810  940  950  800
##  [99]  810  870
ggplot(morley, aes(Speed + 299000)) + geom_histogram(bins = 16) + theme_tufte()

To deal with fewer digits, we look just at the part without the 299,000, i.e. the last 3 digits. These have mean and variance:

Xbar = mean(morley$Speed)
sigma2 = var(morley$Speed)
Xbar
## [1] 852.4
sigma2
## [1] 6242.667
Xbar - 2*sqrt(sigma2/100)
## [1] 836.5979
Xbar + 2*sqrt(sigma2/100)
## [1] 868.2021

Estimates of parameters

Uniform example

n <- 100
X <- runif(n, min = 0, max = theta)
qplot(X, bins = 10) + theme_tufte()

max(X)
## [1] 2.102546
Xn <- replicate(1000, max(runif(n, min = 0, max = theta)))
qplot(Xn, bins = 40) + theme_tufte()

qplot(Xn - theta, bins = 40) + theme_tufte()

Biased/unbiased estimators

qplot(Xn*(n+1)/n - theta, bins = 40) + theme_tufte()

mean(Xn) - theta
## [1] -0.02117188
mean(Xn*(n+1)/n) - theta
## [1] -0.0002032562

Sample variance / standard deviation

Perils of big data