Outline

Optimization

\[ \text{minimize}_{\beta_0, \beta_1} \sum_{i=1}^n (y_i - \beta_0 - \beta_1 x_i)^2 \]

\[ \text{minimize}_{\beta_0} \sum_{i=1}^n (y_i - \beta_0)^2 \]

\[ \text{minimize}_{f} \sum_{i=1}^n (y_i - f(x_i))^2 \]

\[ y_i = f(x_i) + e_i \]

\[ \text{minimize}_{f_1, f_2, f_3} \sum_{i=1}^n (y_i - f_3[f_2(f_1[x_i])])^2 \] - (If you compose linear functions, the resulting function is also linear, just with different coefficients) - (If you compose simple non-linear functions, the resulting function can be a very complicated non-linear one)

Overfitting

Validation

n <- 300
p <- 100
beta <- c(rep(1, 3), rep(0, p - 3))
X <- matrix(rnorm(n*p), nrow = n)
y <- X %*% beta + rnorm(n)
data <- data.frame(y = y, X = X)
split <- sample(1:n, n/2, replace = FALSE)
train <- data[split, ]
test <- data[-split, ]
simple_model <- lm(y ~ X.1 + X.2 + X.3, data = train)
full_model <- lm(y ~ ., data = train) # the . means "every variable in the data"

# Number of predictor variables in each model
c(summary(simple_model)$df[1],
  summary(full_model)$df[1])
## [1]   4 101
# Adjusted R-squared
c(summary(simple_model)$adj.r,
  summary(full_model)$adj.r)
## [1] 0.7839 0.7976
errors <- data.frame(
  model = c(rep("simple", n/2), rep("full", n/2)),
  test_error = c((predict(simple_model, newdata = test) - test$y)^2,
                 (predict(full_model, newdata = test) - test$y)^2))

errors %>% group_by(model) %>% summarize(
  median = median(test_error),
  mean = mean(test_error)
)
## # A tibble: 2 x 3
##   model  median  mean
##   <fct>   <dbl> <dbl>
## 1 full    1.23  2.87 
## 2 simple  0.458 0.914

Cross-validation

x_train <- X[split, ]
y_train <- y[split]
x_test <- X[-split, ]
y_test <- y[-split]
model <- cv.glmnet(x_train, y_train, nfolds = 5)
autoplot(model)

coef(model)
## 101 x 1 sparse Matrix of class "dgCMatrix"
##                   1
## (Intercept) 0.06806
## V1          0.70578
## V2          0.69831
## V3          0.85682
## V4          .      
## V5          .      
## V6          .      
## V7          .      
## V8          .      
## V9          .      
## V10         .      
## V11         .      
## V12         .      
## V13         .      
## V14         .      
## V15         .      
## V16         .      
## V17         .      
## V18         .      
## V19         .      
## V20         .      
## V21         .      
## V22         .      
## V23         .      
## V24         .      
## V25         .      
## V26         .      
## V27         .      
## V28         .      
## V29         .      
## V30         .      
## V31         .      
## V32         .      
## V33         .      
## V34         .      
## V35         .      
## V36         .      
## V37         .      
## V38         .      
## V39         .      
## V40         .      
## V41         .      
## V42         .      
## V43         .      
## V44         .      
## V45         .      
## V46         .      
## V47         .      
## V48         .      
## V49         .      
## V50         .      
## V51         .      
## V52         .      
## V53         .      
## V54         .      
## V55         .      
## V56         .      
## V57         .      
## V58         .      
## V59         .      
## V60         .      
## V61         .      
## V62         .      
## V63         .      
## V64         .      
## V65         .      
## V66         .      
## V67         .      
## V68         .      
## V69         .      
## V70         .      
## V71         .      
## V72         .      
## V73         .      
## V74         .      
## V75         .      
## V76         .      
## V77         .      
## V78         .      
## V79         .      
## V80         .      
## V81         .      
## V82         .      
## V83         .      
## V84         .      
## V85         .      
## V86         .      
## V87         .      
## V88         .      
## V89         .      
## V90         .      
## V91         .      
## V92         .      
## V93         .      
## V94         .      
## V95         .      
## V96         .      
## V97         .      
## V98         .      
## V99         .      
## V100        .

Summary

  • Many cutting edge methods in ML/AI rely on optimization, just like linear regression
  • More complex models have more parameters and/or more complex types of functions to predict the outcome variable
  • Too much model complexity can lead to overfitting–for example, including too many predictor variables can make it seem like the model does a better job of prediction…
  • Test or validation prediction accuracy is a higher standard
  • Keep this image in mind, and remember bias-variance tradeoff
  • Cross-validation is a very useful method (when the sample size is large enough) to automatically pick models with low test error