Motivating Question
GOALS
Suppose we have a set of feature variables \((x_1,x_2,...,x_k)\) but NO outcome variable \(y\).
Instead of our goal being to predict/classify/explain \(y\), we might simply want to…
- Examine the structure of our data.
- Utilize this examination as a jumping off point for further analysis.
UNSUPERVISED METHODS
- Cluster analysis
- Focus: Structure among the rows, i.e. individual cases or data points.
- Goal: Identify and examine clusters or distinct groups of cases with respect to their features x.
- Methods: hierarchical clustering & K-means clustering
- Dimension reduction
- Focus: Structure among the columns, i.e. features x.
- Goal: Combine groups of correlated features x into a smaller set of uncorrelated features which preserve the majority of information in the data. (We’ll discuss the motivation later!)
- Methods: Principal components
CLUSTERING vs DIMENSION REDUCTION EXAMPLE
CLUSTERING EXAMPLES
Machine learning about each other
Identify genetic similarities among a group of patients