10  Regression Review

Settling In

  • Sit with your assigned group
  • Catch up on any posts you’ve missed on Slack
  • Plan your Capstone Days schedule!
  • Open Group Assignment 1 (and the recently updated instructions)



Notes: Preparing for Quiz 1

Logistics

Content:

  • Regression (Units 1–3) (+ Day 1 Introductions)
  • Questions will range in style: multiple choice, fill in the blank, short response, matching, etc.

Part 1:

  • focused on concepts
  • on paper
  • closed notes
  • closed people

Part 2:

  • focused on code
  • on paper
  • instructor-provided notesheet
  • closed people



Context

What have we covered so far?

For the Regression task:

  • Day 1: What is Regression? (And what other types of Machine Learning tasks are there?)
  • Unit 1: Evaluating regression models
  • Unit 2: Building regression models / selecting predictors
  • Unit 3: Building flexible (nonparametric, nonlinear) regression models

General concepts that translate to other ML tasks:

  • Overfitting
  • Cross validation
  • Bias-variance tradeoff
  • Algorithms and tuning parameters
  • Preprocessing steps
  • Parametric vs nonparametric models



Review & Reflection

STAT 253 is a survey course of statistical machine learning techniques and concepts. It’s important to continuously reflect on these and how they fit together. Though you won’t hand anything in, or work on this in class today, you’re strongly encouraged to complete this activity. The materials linked below are designed to help you reflect upon:

  • ML concepts
    • enduring, big picture concepts
    • technical concepts
    • tidymodels code
  • Your progress toward…
    • engagement
    • collaboration
    • preparation (checkpoints)
    • exploration (homework)

Find and make a copy of the following 2 resources. You’ll be given some relevant prompts below, but you should use these materials in whatever way suits you! Take notes, add more content, rearrange, etc.


Concept Maps

Mark up slides 1–4 of the concept map with respect to the prompts below. Much of this overlaps with HW3.

Enduring, big picture concepts

IMPORTANT to your learning: Respond in your own words.

  • When do we perform a supervised vs unsupervised learning algorithm?
  • Within supervised learning, when do we use a regression vs a classification algorithm?
  • What is the importance of “model evaluation” and what questions does it address?
  • What is “overfitting” and why is it bad?
  • What is “cross-validation” and what problem is it trying to address?
  • What is the “bias-variance tradeoff”?

Technical concepts

On page 2, identify some general themes for each model algorithm listed in the lefthand table:

  • What’s the goal?
  • Is the algorithm parametric or nonparametric?
  • Does the algorithm have any tuning parameters? What are they, how do we tune them, and how is this a goldilocks problem?
  • What are the key pros & cons of the algorithm?

For each algorithm, you should also reflect upon these important technical concepts:

  • Can you summarize the steps of this algorithm?
  • Is the algorithm parametric or nonparametric? (addressed above)
  • What is the bias-variance tradeoff when working with or tuning this algorithm?
  • Is it important to scale / pre-process our predictors before feeding them into this algorithm?
  • Is this algorithm “computationally expensive”?
  • Can you interpret the technical (RStudio) output for this algorithm? (eg: CV plots, etc)?

Model evaluation

On page 2, do the following for each model evaluation question in the righthand table:

  • Identify what to check or measure in order to address the question, and how to interpret it.
  • Explain the steps of the CV algorithm.

Algorithm comparisons

  • Use page 3 to make other observations about the Unit 1-3 modeling algorithms and their connections.
  • Use page 4 to address and compare the interpretability & flexibility of the Subset Selection (e.g. backward stepwise), LASSO, and Least Squares algorithms. Where would you place Splines and KNN on this graphic?


tidymodels Code Comparison

Check out and reflect upon the tidymodels code comparisons here. Copy, use, tweak, and add to this in whatever way suits you!



(Coming Soon!) Learning Reflection 1

The reflections above address your understanding of key machine learning concepts. Now that we are over a third of the way through the semester, I’d also like you to take some time to reflect on your engagement with the course and your progress toward the course learning goals (here and here).

To this end, you will complete the first of our three Learning Reflection assignments. More detailed instructions coming soon!

Learning Reflection 1 Deadline: Tuesday, March 11.



Other Study Tips

  • Use the course learning goals as a study guide
  • Use past checkpoints, in-class exercises, and homework problems as a “practice quiz”
  • Come to office hours (mine and preceptors)!
  • Work on Group Assignment 1



Exercises

Use the rest of class time today to work on Group Assignment 1!

  • Goal for Day 1: finish steps 1–6 and start step 7.
  • Goal for Day 2: finish steps 7–9.
  • (You will also likely need to do some work outside of class time. Plan accordingly!)



Wrapping Up

  • finish HW3 (due TONIGHT)
  • continue working on Group Assignment 1
  • enjoy MSCS Capstone Days!
    • attend 3 talks (instead of class)
    • bring your completed Capstone Reflection to class next Tuesday
  • get started on:
    • studying for Quiz 1
    • Learning Reflection 1