A brief introduction to the course, preview of things to come, and some foundational background material.

This week introduced some of the key conceptual themes in machine learning. Two simple examples illustrated different strategies for building more complex models:

- increasing complexity of the function class, for example by allowing functions to fit flexibly/locally to different subsets of the data
- increasing the dimension of predictors (while otherwise keeping the function class fixed)

Model complexity relates to the bias-variance trade-off: more complexity *typically* results in lower bias and higher variance.

Increasing complexity also (essentially always) results in a lower mean-squared error if the MSE is calculated on the same dataset that was used to fit the model. But if the MSE is calculated on a different dataset this is no longer true, and more complexity may result in a larger MSE.

Why should we evaluate model fit (like MSE) on a different dataset than the one used to fit the model? First, if we evaluate it on the same dataset instead, then such an evaluation will always prefer greater complexity until the model fully saturates the data. In this case there was nothing gained from using a model–we have only created a map as large as the entire territory. Second, if our purpose in using a model is to describe some *stable* aspect of the world, then we would hope that such a model’s fit would not immediately fail if the time or context of the data collection is slightly different.

Since these concepts are so central to machine learning we will return to them several times through the term and understand them through more examples and some mathematical derivations.

Slides for first video

Notebook for `gapminder`

example

Notebook for `candy`

example

Notebook from seminar

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

For attribution, please cite this work as

Loftus (2021, Jan. 15). Neurath's Speedboat: Introduction and foundations. Retrieved from http://joshualoftus.com/ml4ds/01-introduction-foundations/

BibTeX citation

@misc{loftus2021introduction, author = {Loftus, Joshua}, title = {Neurath's Speedboat: Introduction and foundations}, url = {http://joshualoftus.com/ml4ds/01-introduction-foundations/}, year = {2021} }