22  Unsupervised Learning Review

Settling In

  • Sit with the same group as last class
  • Check Slack for recent announcements and other posts
  • Prepare to take notes



End-of-Course Survey

Just as feedback and suggestions from instructors (on assignments, in office hours, etc.) is helpful for guiding your learning, your feedback is likewise helpful to instructors in guiding updates / improvements we make to our courses and teaching.

I would GREATLY appreciate hearing your thoughts on what worked well for you in this class, and what you think could be improved. To that end, please take ~15 minutes to fill out the End-of-Course Survey to share your thoughts. FYI: Your responses to this survey are anonymous, and I will not have access to them until after final grades have been submitted.

You can access this survey via Moodle:

  • Find the “End of Course Survey (EOCS)” link in the Surveys sections of our Moodle page (or here)



Preparing for Quiz 3

Reminders

  • During class time on Thursday (May 1)
  • Format: same as Quizzes 1 and 2
  • Content: cumulative, but focus on unsupervised learning
  • Study Tips:
    • Use the course Learning Goals as a study guide
    • Fill out the STAT 253 Concept Maps
    • Review old CPs, HWs, and in-class exercises
    • Work on Group Assignment 3 (today!)

. . .

NOTE: the revision process will be slightly different for this quiz since you are taking it on the last day of class



Final Learning Reflection

The Final Learning Reflection assignment will be due by 11:59 pm on Saturday, May 10.

  • Will include similar questions to the earlier reflections, as well as:
    • An option to reflect on your Quiz 3 feedback (after picking up your graded quiz during office hours)
    • A brief exploration of a new statistical machine learning algorithm we did not cover during our class
  • Budget a few hours for this over the course of finals week.
    • Think of this as the time you would have spent studying for a final exam or wrapping up a final project. (This is your only “final” assignment for this class!)
    • We hope it will also provide a useful opportunity to reflect back on everything you have learned this semester. Give yourself enough time and space for meaningful reflection!
  • NOTE: We will not meet in person during our final exam block. Use this time to finish your reflection!
Important

This Final Learning Reflection will play an important role in your final grade. (Review the Course Syllabus for details.) Plan accordingly!




Group Assignment 3

Use the rest of class time today to continue your work on Group Assignment 3.

Suggestions / Reminders

  • Remember the target audience is a general audience (not your instructor / preceptors) – adjust your writing accordingly.
  • If you make any modifications to the data, make sure that is clearly explained and documented in your report.
  • Use data visualizations thoughtfully!
  • Use an Appendix for supplemental code.
  • Don’t forget to complete the Group Assignment Feedback Survey (link on Moodle) after you submit your report on Friday!




Additional Review Questions

If you finish all of the tasks above, talk through the following review questions with your group. Otherwise, come back to this while studying for the quiz.

Part 1: enduring, big picture concepts

Slide 1 of the STAT 253 Concept Maps presents a set of enduring, big picture questions that are critical to doing, critiquing, and understanding machine learning analyses. I hope these stick with you for years to come. They are also important for Quiz 3.

Respond in your own words!

  • When do we perform a supervised vs unsupervised learning algorithm?
  • Within supervised learning, when do we use a regression vs a classification algorithm?
  • What is the importance of model evaluation and what questions does it address?
  • What is overfitting and why is it bad?
  • What is cross-validation? How does it work and what problem is it trying to address?
  • What is the bias-variance tradeoff? What models tend to have high bias? High variance?

Part 2: supervised learning

Slides 2–8 present a variety of regression & classification algorithms & concepts. On Quiz 3, you won’t be asked about the nitty gritty (eg: how to interpret coefficients, make predictions, do the algorithm by hand). But you should have the following bigger picture understanding of how all of the algorithms fit together.

For each algorithm:

  • In what situations is it useful? Could you use it for regression, classification, or both?
  • Is the algorithm parametric or nonparametric? What’s the difference?
  • In general, how does the algorithm work?

Part 3: unsupervised learning

Slide 9 presents important unsupervised tasks and algorithms. In the table, take notes on the following:

  • What’s the goal of clustering? Dimension reduction?
  • How are these goals similar? How do they differ?
  • Reflect upon the hierarchical clustering algorithm. And think:
    • What are the steps of the algorithm?
    • Can you implement this algorithm by hand for a small sample?
    • Can you interpret and use a dendrogram?
    • What’s the difference between complete, single, centroid, and average linkage? What role do these play in hierarchical clustering?
    • What are some pros and cons of this algorithm?
  • Reflect upon the K-means algorithm. And think:
    • What is K?
    • What values can K take and what impact does this have on our results?
    • What are the steps of the algorithm?
    • What are some pros and cons of this algorithm?
  • Reflect upon the principal component analysis algorithm.
    • What’s the goal?
    • What does PCA produce?
    • Can you interpret PCs and understand how they’re defined (the idea, not the math)?
    • Can you interpret loadings plots, scree plots, and score plots?
    • What are some pros and cons of this algorithm?

Part 4: supervised + unsupervised

Principal Component Regression (PCR) combines supervised and unsupervised ideas. It has been added to Slide 11.

Reflect upon the following:

  • What is the goal of PCR?
  • What are the general steps of the PCR algorithm?
  • How does PCR differ from LASSO? How does it differ from just kicking out some predictors (eg: using backward stepwise regression)?
  • What are some pros and cons of PCR relative to LASSO or other variable selection techniques?




Wrapping Up

Upcoming Due Dates:

  • Quiz 3: during next class (5/1)
  • Group Assignment 3: due Friday (5/2)
  • Final Learning Reflection: during finals week (5/10)