10  Regression Review


Note

Most of today’s class time will be devoted to working on Group Assignment 1.

To make the most of our time together, I strongly recommend that you get started on the assignment before today’s class. In particular, I suggest that you finish Steps 1–6 described in the Process section of the GA1 Instructions before class today.


Settling In

  • Sit with your Group Assignment 1 group
  • Catch up on any recent posts you’ve missed on Slack



Preparing for Quiz 1

Logistics

When: March 4

  • first 60 minutes = Quiz 1
  • last 30 minutes = GA1 Work Time

Topics:

  • ML Overview (Unit 0) + Regression (Units 1–3)
  • concepts and code
  • (use the Course Learning Goals as a study guide!)

Format:

  • On paper (no computers!)
  • Closed notes, except for an instructor-provided R notesheet
  • Questions will range in style: multiple choice, fill in the blank, short response, matching, etc.

Questions?


Context

What have we covered so far?

For the Regression task:

  • Unit 0: What is Regression? (And how does it differ from other types of Machine Learning tasks?)
  • Unit 1: Evaluating regression models
  • Unit 2: Building regression models / selecting predictors
  • Unit 3: Building flexible (nonparametric, nonlinear) regression models

General concepts that translate to other ML tasks:

  • Overfitting
  • Cross validation
  • Bias-variance tradeoff
  • Algorithms and tuning parameters
  • Preprocessing steps
  • Parametric vs nonparametric models


Review & Reflection

Note

Though you won’t hand anything in, or work on this in class today, you’re strongly encouraged to complete this activity.

STAT 253 is a survey course of statistical machine learning techniques and concepts. It’s important to continuously reflect on these and how they fit together. The materials linked below are designed to help you reflect upon:

  • ML concepts
    • enduring, big picture concepts
    • technical concepts
    • tidymodels code
  • Your progress toward…
    • engagement
    • collaboration
    • preparation (checkpoints)
    • exploration (homework)

Find and make a copy of the following 2 resources. You’ll be given some relevant prompts below, but you should use these materials in whatever way suits you! Take notes, add more content, rearrange, etc.


Concept Maps

Mark up slides 1–5 of the concept map with respect to the prompts below. Much of this overlaps with HW3.

Enduring, big picture concepts

IMPORTANT to your learning: Respond in your own words.

  • When do we perform a supervised vs unsupervised learning algorithm?
  • Within supervised learning, when do we use a regression vs a classification algorithm?
  • What is the importance of “model evaluation” and what questions does it address?
  • What is “overfitting” and why is it bad?
  • What is “cross-validation” and what problem is it trying to address?
  • What is the “bias-variance tradeoff”?

Technical concepts

On slide 2, identify some general themes for each model algorithm listed in the lefthand table:

  • What’s the goal?
  • Is the algorithm parametric or nonparametric?
  • Does the algorithm have any tuning parameters? What are they, how do we tune them, and how is this a goldilocks problem?
  • What are the key pros & cons of the algorithm?

For each algorithm, you should also reflect upon these important technical concepts:

  • Can you summarize the steps of this algorithm?
  • Is the algorithm parametric or nonparametric? (addressed above)
  • What is the bias-variance tradeoff when working with or tuning this algorithm?
  • Is it important to scale / pre-process our predictors before feeding them into this algorithm?
  • Is this algorithm “computationally expensive”? What factors affect the computation time/cost?
  • Can you interpret the technical (RStudio) output for this algorithm? (eg: CV plots, etc)?

Model evaluation

On slide 2, do the following for each model evaluation question in the righthand table:

  • Identify what to check or measure in order to address the question, and how to interpret it.
  • Explain the steps of the CV algorithm.

Algorithm comparisons

  • Use slide 3 to make other observations about the Unit 1-3 modeling algorithms and their connections.
  • Use slide 4 to address and compare the interpretability & flexibility of the Subset Selection (e.g. backward stepwise), LASSO, and Least Squares algorithms. Where would you place Splines and KNN on this graphic?


tidymodels Code Comparison

Check out and reflect upon the tidymodels code comparisons here. Copy, use, tweak, and add to this in whatever way suits you!




Other Study Tips

  • Create a study guide using the course learning goals
  • Review past checkpoints, in-class exercises and notes (including R code), and homework problems
    • quiz yourself! (eg try the checkpoints/exercises again, without notes)
    • review homework feedback and implement any of the suggestions you received
  • Complete the provided review activities:
    • Concept maps
    • tidymodels code comparison
    • HW3
    • Group Assignment 1
  • Come to office hours (mine and preceptors)!



Exercises

Use the rest of class time today to work on Group Assignment 1!



Suggestions:

  1. Check-in with each other as humans.

  2. Go around the table and report (individually):

    • What progress have you made since your last check-in?
    • What insights have you gained about the data or models thus far?
    • What questions do you still have about the data, the model(s) you’ve tried, etc?
  3. As a group, decide collectively:

    • which predictors you are going to use (and how to justify those choices!)
    • what modifications you need to make, if any, to those predictors (eg combining categories, changing units)
    • which one predictive model you will present in your report as your final/“best” model (and how to justify that choice!)
    • how you will divide up the writing (and reviewing/editing!) of each section of the report
    • what questions you want to ask your instructor today
  4. Then, get to work! Check-in with each other and the instructor as you go.

  5. Before you leave, make sure you have a clear plan for any remaining tasks.



Submission reminders:

  • Each group will submit one (HTML) report
    • Use the Quarto template provided on Moodle
    • Carefully review the instructions (on Moodle) and rubric to ensure that you’ve included all requested information in the appropriate section
  • Each individual will submit the Group Assignment Feedback Survey (link on Moodle)
  • Deadline for both components: end-of-day Friday, Mar 6



Wrapping Up

After class today: