Machine Learning 4 Data Science

Regularization and validation

When optimizing an ML model there are a variety of strategies to improve generalization from the training data. We can add a complexity penalty to the loss function, and we can evaluate the loss function on validation data.

Optimization and overfitting

Optimization is about finding the best model. With greater model complexity it becomes increasingly important to avoid overfitting: finding a model that is best for one specific dataset but does not generalize well to others.


Categorical or qualitative outcome variables are ubiquitous. We review some supervised learning methods for classification, and see how these may be applied to observational causal inference.

Multiple regression and causality

Multiple linear regression does not, by default, tell us anything about causality. But with the right data and careful interpretation we might be able to learn some causal relationships.

Linear regression

Reviewing linear regression and framing it as a prototypical example and source of intuition for other machine learning methods.

Introduction and foundations

A brief introduction to the course, preview of things to come, and some foundational background material.

More articles »

Machine Learning 4 Data Science



ISLR + ESL + CASI + Mixtape + R4DS + MLstory


Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".