git-STAA-577
Slides, code, cheat sheets, and RStudio lab notebooks for "Applied Machine Learning" course for Spring 2019
Project maintained by stufield
Hosted on GitHub Pages — Theme by mattgraham
GitHub Repository for STAA 577
Overview
RStudio lab notebooks, full R code, cheat sheets, resources, and ad
hoc notes from “Applied Machine Learning” course Spring 2019.
Why use GitHub?
We have decided to place the course materials in a GitHub
repository:
- to familiarize you with this widly used collaborative coding tool
- so that you will have access to them beyond your tenure at CSU when
you venture into the official job market. Jenny
Bryan and Jim
Hester summarize the benefits of
GitHub
in this fantastic reference
here:
If you ever plan to use verion control with GitHub
I strongly
recommend reading it in detail.
Course Lab Content
- Intro
Labs
- Lab 00:
Basic
Exploring
- Lab 01:
Subsetting (data
frames)
- Lab 02:
Data Wrangling with
dplyr
and the tidyverse
- Lab 03: Skipped to synchronize course and textbook
ISLR
- Lab 04:
Classification
- The
S&P
Stock Market Data Set
- Logistic Regression
- Discriminant Analysis
- KNN: K-Nearest
Neighbors
- Lab 05:
Cross Validation
- The
Auto
Data Set
- Cross Validation (by hand)
- LOOCV (leave-one-out)
- K-fold CV
- The
Bootstrap
- Lab 06:
Subset Selection
- The
Hitters
Data Set
- Subset Selection
- Shrinkage Methods: Ridge Regression
- Shrinkage Methods: The
Lasso
- Lab 07:
Beyond Linearity
- The
Wage
Data Set
- Polynomial Regression
- Polynomial Logistic Regression
- Spline Regression
- General Additive
Models
- Lab 08:
Tree-based Methods
- The
Carseats
Data Set
- Classification Trees
- Regression Trees
- Bagging
- Boosting
- Appendices
- Resources
- Lab 09:
Support Vector Machines
- Create training data
- Support Vector Classifier
- Support Vector Machine
- ROC
curves
- Lab 10:
Unsupervised Learning
- Principal Component Analysis (PCA)
- K-means Clustering
- Heirarchial Clustering
Datasets for STAA 577
- nyflights13
- new york city airport flight data from 2013 (must install)
- install with
install.packages("nyflights13",
repos="http://cran.rstudio.com")
- iris
- classic iris flower data set from Fisher (comes with R
installed)
- mtcars
- mtcars: USA motor trend cannonical data set (comes with R
installed)
Cheatsheets
Previewing HTML on GitHub
- Fairly useful tool to preview HTML docs without having to clone
the repository
- Right-click the *.html file, copy the link, then go
here, paste the GitHub specific
HTML
link
Sad But True
Stu’s Looping Rules for R
- Always use a vectorized solution over iteration when possible,
otherwise … go to #2.
- Use a functional. Since R is a functional language and for
readability, usually of the
apply()
family, or a loop-wrapper
function, unless …
- modifying in place: if you are modifying or transforming
certain subsets (columns) of a data frame.
- recursive problems: whenever an iteration depends on the
previous iteration, a loop is better suited because a functional
does not have access to variables outside the present lexical
scope.
- while loops: in problems where it is unknown how many
iterations will be performed, while-loops are well suited and
preferred over a functional.
- If you must use a loop, ensure the following:
- Initialize new objects: prior to the loop, allocate the
necessary space ahead of time. Do NOT “grow” a vector on-the-fly
within a loop (this is terribly slow).
- Optimize operations: do NOT perform operations inside the
loop that could be done either up front of applied in a
vectorized fashion following the loop. Enter the loop, do the
bare minimum, then get out.
Hadley Wickham Links
Jenny Bryan’s Links
Max Kuhn’s Links
https://github.com/topepo
Modeling Framework (thx Max Kuhn)
Memory Usage and rsample
:
The rsample package is smarter than
you might think.
Vignettes
What is the Tidyverse?
Information about the:
Created on 2019-01-27 by
Rmarkdown (v1.11) and R version
3.5.2 (2018-12-20).