Motivating Question
Where are we?
Within the supervised learning framework, we have a categorical response variable \(y\) and a set of potential predictors \(x\). For example:
- y = vote / don’t vote, x = (age, party id, …)
- y = spam / not spam, x = (# of $, # of !, …)
- y = human / car / plant, x = (speed, shape, …)
We have the following goals:
- Build a classification model
We’ll use the following techniques to build classification models of \(y\) from predictors \(x\):- parametric techniques
- logistic regression (with or without LASSO!)
- support vector machines (optional)
- logistic regression (with or without LASSO!)
- nonparametric techniques
- K Nearest Neighbors (KNN)
- classification trees
- random forests and bagging
- parametric techniques
- Evaluate the quality of a classification model
We’ll use the following metrics and tools to evaluate the quality of a classification model:- overall accuracy, sensitivity, & specificity
We can approximate these metrics using in-sample and cross validation techniques.
- ROC (receiver operating characteristic) curves
- overall accuracy, sensitivity, & specificity