10  Regression Review

Settling In

  • Sit with your assigned group
  • Catch up on any posts you’ve missed on Slack
  • Open Group Assignment 1


Quiz 1 Reminders

Content:

  • Regression (Units 1–3)
  • Questions will range in style: multiple choice, fill in the blank, short response, matching, etc.

Part 1:

  • on paper
  • closed people, closed notes
  • due by the end of class
  • you might be asked to interpret some R output, but I won’t ask you to produce any code

Part 2:

  • on computers
  • you can chat with any current STAT 253 student, but nobody else (including preceptors)
  • you can DM or email me clarifying questions and if there is something that could benefit from broader clarification I’ll share my answer (on Slack) with the entire class
    • note: I do not check email/Slack ~ 5pm–7am
  • you can use any materials from this STAT 253 course (videos, course website, textbook, homework solutions, etc.), but you may not use any other resources (internet, chatGPT, etc.)
  • this is designed to finish during class, but you can hand it in any time within 24 hours of your class end time (eg 11:10am the next day for the 9:40am section)



Notes

Context

What have we covered so far?

For the Regression task:

  • Unit 1: Model evaluation
  • Unit 2: Building models / selecting predictors
  • Unit 3: Building flexible (nonparametric, nonlinear) models

General concepts that translate to other ML tasks:

  • Overfitting
  • Cross validation
  • Bias-variance tradeoff
  • Algorithms and tuning parameters
  • Preprocessing steps
  • Parametric vs nonparametric models

Review & Reflection

STAT 253 is a survey course of statistical machine learning techniques and concepts. It’s important to continuously reflect on these and how they fit together. The material for class today is designed to help you reflect upon:

  • ML concepts
    • enduring, big picture concepts
    • technical concepts
    • tidymodels code
  • Your progress toward…
    • engagement
    • collaboration
    • preparation (checkpoints)
    • exploration (homework)

Find and make a copy of the following 2 resources. You’ll be given some relevant prompts below, but you should use these materials in whatever way suits you! Take notes, add more content, rearrange, etc.



Exercises

Part 1: Group Assignment 1

Please fill out this Group Agreement Activity with your group. You will “submit” this activity by adding Kelsey as an editor on your google doc.



Part 2: Preparing for Quiz 1

Concept Maps

Mark up slides 1–4 of the concept map with respect to the prompts below. Much of this overlaps with HW3.



Enduring, big picture concepts

IMPORTANT to your learning: Respond in your own words.

  • When do we perform a supervised vs unsupervised learning algorithm?
  • Within supervised learning, when do we use a regression vs a classification algorithm?
  • What is the importance of “model evaluation” and what questions does it address?
  • What is “overfitting” and why is it bad?
  • What is “cross-validation” and what problem is it trying to address?
  • What is the “bias-variance tradeoff”?



Technical concepts

On page 2, identify some general themes for each model algorithm listed in the lefthand table:

  • What’s the goal?
  • Is the algorithm parametric or nonparametric?
  • Does the algorithm have any tuning parameters? What are they, how do we tune them, and how is this a goldilocks problem?
  • What are the key pros & cons of the algorithm?

For each algorithm, you should also reflect upon these important technical concepts:

  • Can you summarize the steps of this algorithm?
  • Is the algorithm parametric or nonparametric? (addressed above)
  • What is the bias-variance tradeoff when working with or tuning this algorithm?
  • Is it important to scale / pre-process our predictors before feeding them into this algorithm?
  • Is this algorithm “computationally expensive”?
  • Can you interpret the technical (RStudio) output for this algorithm? (eg: CV plots, etc)?



Model evaluation

On page 2, do the following for each model evaluation question in the righthand table:

  • Identify what to check or measure in order to address the question, and how to interpret it.
  • Explain the steps of the CV algorithm.



Algorithm comparisons

  • Use page 3 to make other observations about the Unit 1-3 modeling algorithms and their connections.
  • Use page 4 to address and compare the interpretability & flexibility of the Subset Selection (e.g. backward stepwise), LASSO, Least Squares, and GAM algorithms. Where would you place KNN on this graphic?



Tidymodels Code Comparison

Check out and reflect upon some tidymodels code comparisons here. Copy, use, tweak, and add to this in whatever way suits you!



Part 3: Midterm Learning Reflection

The reflections above address your understanding of key machine learning concepts. At this point in the semester, I’d also like you to take some time to reflect on your engagement with the course and your progress toward the “general skills” learning goals (e.g., collaboration) outlined in the Course Syllabus and on the learning goals page on this website.

To this end, please fill out this Google Form sometime in the next week.



Wrapping Up

  • finish HW3 (due TONIGHT)
    • I’ll post solutions within 3 days so you can review them while studying
  • continue working on Group Assignment 1
    • before you leave today, make sure you have a clear idea of your next steps!
  • study for Quiz 1
  • continue working on your Midterm Learning Reflection