The tidyverse
is a collection of R packages for data importing, reshaping wrangling, and visualization. All packages share a common design philosophy, that of tidy data, namely that in tidy datasets
These packages are intended to make statisticians and data scientists more productive by guiding them through workflows that facilitate communication, and result in reproducible work products. A very nice introduction and motivation can be found in this RStudio post What is the tidyverse? and Hadley Wickham’s keynote address at the 2017 RStudio Conference.
You can easily load the core of tidyverse packages using library(tidyverse)
that loads the following tidyverse packages:
readr
(reading data)ggplot2
(data visualisation)tibble
(handling dataframes)tidyr
(reshaping dataframes)dplyr
(data manipulation, or wrangling)purrr
(functional programming)stringr
(working with strings/characters)forcats
(working with factors, or categorical variables that have a fixed and known set of possible values)Together these packages form the basis of the tidyverse data science workflow:
The tidyverse includes many other packages like stringr
and lubridate
which must be loaded explicitly.
dplyr
You can work through a few of RStudio’s introductory primers where you type code and see the results.You’ll learn some of the basics of R, as well as some powerful methods for manipulating data with the dplyr package.
tidymodels
tidymodels
is a meta-package for modelling and statistical analysis that share the underlying design philosophy, grammar, and data structures of the tidyverse.
broom
takes the messy output of built-in functions in R, such as lm, nls, or t.test, and turns them into tidy data frames.infer
is a modern approach to statistical inference.recipes
is a general data preprocessor with a modern interface.rsample
has infrastructure for resampling data so that models can be assessed and empirically validated.yardstick
contains tools for evaluating models (e.g. accuracy, RMSE, etc.)tidypredict
translates some model prediction equations to SQL for high-performance computing.tidyposterior
can be used to compare models using resampling and Bayesian analysis.tidytext
contains tidy tools for quantitative text analysis, including basic text summarization, sentiment analysis, and text modeling.dials
contains tools to create and manage values of tuning parameters and is designed to integrate well with the parsnip package.
This page last updated on: 2020-07-15