# Turn quant_var into a Low/Med/High version
<- data %>%
data mutate(cat_var = case_when(
< 10 ~ "Low",
quant_var >= 10 & quant_var <= 20 ~ "Med",
quant_var > 20 ~ "High"
quant_var
)
)
# Turn cat_var (A, B, C categories) into another categorical variable
# (collapse A and B into one category)
<- data %>%
data mutate(new_cat_var = case_when(
%in% c("A", "B") ~ "A or B"
cat_var =="C" ~ "C"
cat_var
)
)
# Turn a categorical variable (x1) encoded as a numerical 0/1/2 variable into a different quantitative variable
# Doing this for multiple variables allows you to create an index
<- data %>%
data mutate(x1_score = case_when(
==0 ~ 10,
x1==1 ~ 20,
x1==2 ~ 50
x1
)
)
# Add together multiple variables with mutate
<- data %>%
data mutate(index = x1_score + x2_score + x3_score)
R Resources
Tidymodels resources
Tidyverse resources
- Brianna Heggeseth’s COMP/STAT 112 website (with code examples and videos)
- R for Data Science
- Exploratory Data Analysis with R
- John’s Hopkins Tidyverse course text
Visualization resources
General R resources
- RStudio cheatsheets
- Advanced R
- R Programming Wikibook
- Debugging in R
Some example code
Creating new variables
case_when()
from the dplyr
package is a very versatile function for creating new variables based on existing variables. This can be useful for creating categorical or quantitative variables and for creating indices from multiple variables.