Motivating Question

GOAL

We have a quantitative response variable \(y\) and want to build a predictive regression model of \(y\) using a bunch of potential predictors \(x\).

BUT

The relationships between \(y\) and \(x\) are complicated, thus our existing modeling tools (e.g. least squares algorithm, LASSO) are too rigid. How can we build a flexible predictive regression model?





Parametric vs nonparametric models

The shared goal behind parametric and nonparametric regression models is to build a model of some quantitative response variable \(y\) using predictors \((x_1, x_2, ..., x_p)\):

\[y = f(x_1, x_2, ..., x_p) + \varepsilon\]



Common flexible regression models

  1. K Nearest Neighbors (KNN)
  2. Local regression / locally weighted scatterplot smoothing (LOESS) & generalized additive models (GAM)
  3. Smoothing splines