After a fresh installation of R, you have its core engine which is also known as base R. You can do a lot of statistical analysis in base R without any additional packages. However, the beauty of the R ecosystem is the ability to install and use an R package, namely a collection of functions, data sets, and other R objects that other users have written to and are all grouped together under a common name.

2.1 Install once, load every time

Before we get started, there’s a critical distinction between having a package installed on your computer, and having a package loaded in R. The simple rule is that you install once, but you must load the package everytime you need to use it. When you install R on your computer, you don’t get all of the R packages: only a few come bundled with the base R installation. So, after a fresh installation, there are few packages installed on your computer, and thousands more that are not installed. In order for R to be able to use one of the first few installed packages ir comes with, that package must also be loaded. So remember,

  • A package must be installed once on your computer before it can be loaded
  • A package must be loaded every single time before it can be used in your R program

This relationship is illustrated in the image below:

2.2 The package panel

In the lower right hand panel in Rstudio there is a tab labelled Packages. Click on the tab, and you’ll see a list of packages that looks something like this:

You are unlikely to have exacty the same list of packages, as we have installed quite a few packages. Every row in this panel corresponds to a different package, and every column is a useful piece of information about that package, namely:

  • The check box on the left column indicates whether or not the package is loaded into the computer’s memory.
  • The text immediately to the right of the check box is the name of the package.
  • The short passage of text next to the package’s name is a brief description of the package.
  • The number next to the description is the version of the package you have installed.
  • The little x-mark at the right-most column is a button that you can click to uninstall the package from your computer.

2.3 Installing the tidyverse

You should install some essential packages now that you will need in the Data Analytics courses. Copy and paste the following command into the R console to do this

# install the major packages from the tidyverse
install.packages("tidyverse")

This will take a while as tidyverse is a collection of packages and R will have to install all dependencies.

2.4 Installing the tidyverse if you have a Mac

Unfortunately, installing the tidyverse isn’t quite always a straight-forward task with the current version of macOS 10.14, Mojave which was released on September 24, 2018.

To solve issues thay may arise with missing xml2 library, please do the following:

  1. Open Terminal (the tab right next to Console)
  2. Type
xcode-select --install

Be careful as you do need two dashes before the install. A software update popup window should appear that will ask if you want to install command line developer tools. Click on “Install” (you don’t need to click on “Get Xcode”)

  1. Go to https://brew.sh and copy the long command under “Install Homebrew” (starts with /usr/bin/ruby -e "$(curl -fsSL.), paste it into Terminal, and press enter.
/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"

This installs Homebrew, which is special software that lets you install Unix-y programs from the terminal.

  1. Type the following command line in Terminal to install libxml2
brew install libxml2 
  1. Then, within RStudio, type
install.packages("xml2") 
  1. Finally, you can now proceed with the installation of the tidyverse
install.packages("tidyverse")

2.5 Installing further packages

Once the tidyverse collection of packages installs and you get back to the R prompt >, you can install a series of packages that will be useful later in the course. You can copy/paste the code below; please note that this will take quite a while, so grab a coffee.

# install these packages as well
list_of_packages <- c(
  "moderndive",   # https://www.moderndive.com/
  "DT",           # Allows us to handle Data Tables and manipulate data faster 
  "unvotes",      # How countries have voted in UN resolutions
  "gridExtra",    # Miscellaneous Functions for "Grid" Graphics
  "GGally",       # Allows us to create a correlations/scatterplots matrix 
  "tidyquant",    # Download and manipulate financial data
  "wbstats",      # Download World Bank Data
  "eurostat",     # Download data from Eurostat
  "fpp2",         # Time Series and Forecasting fucntions, with data too 
  "car",          # Applied Regression- allows to calculate VIF, Variance Inflation Factor
  "gapminder",    # Data on life expectancy, GDP/capita, and population by country and year
  "nycflights13", # Data on all domestic flights through NYCs 3 airports (JFK, EWR, LGA) in 2013
  "fivethirtyeight", #Data used in articles that appeared in the fivethirtyeight.com website
  "corrr",        # correlation in R
  "plotly",       # interactive visualizations
  "sf",           # tidy geo-computing
  "cowplot",      # ggplot multiple figures addon
  "coefplot",     # plot coefficients from fitted models
  "interplot",    # plot effects of variables in interaction terms
  "scales",       # scale functions for visualisations 
  "ggridges",     # ridgeline plots in ggplot2
  "skimr",        # nice dataframe summaries
  "leaflet",      # interactive maps
  "ggrepel",      # geoms for ggplot2 to repel overlapping text labels
  "viridis",      # Colour Maps
  "rvest",        # scrape webpages
  "usethis",      # automation of package and project setup
  "devtools",     # installing packages from Github
  "tidytext",     # text mining
  "here",         # finding your files 
  "mosaic"        # summary stats, using mosaic::favstats()
)

install.packages(list_of_packages, dependencies=TRUE, repos = "https://cran.rstudio.com/")

2.6 Install from Github

Most of the time the packages that you’ll want to install have been made available on CRAN, the Comprehensive R Archive Network, so you use the install.packages("package_name") function. Sometimes people write packages that are not submitted to CRAN, and sometimes you might want to try out a package that is currently under development. In these situations, people who write packages will often make them available on GitHub. We can install packages directly from Github, using the devtools package.

The first thing you need to do is install devtools, which is easy because that package is available on CRAN and hopefully you installed it with all packages listed earlier. If not,

install.packages("devtools")

Once you install devtools, you must explicitly say to R you will be using it by typing library(devtools). Then, you can use the install_github command to install a package directly from a GitHub repository. For example, there’s an R data package featuring every Lego set from 1970 to 2015 put together by Sean Kross.

library(devtools)

install_github("seankross/lego") #install the lego package directly from Github 

R fetches and installs the package from Github, and we now have the new lego package to play with. To verify that everything worked properly, let’s load the lego package and look at its legosets dataframe:

library(lego)     #load the lego package into the computer's memory

legosets          #view the legosets dataframe
## # A tibble: 6,172 x 14
##    Item_Number Name     Year Theme  Subtheme  Pieces Minifigures Image_URL    GBP_MSRP USD_MSRP CAD_MSRP EUR_MSRP Packaging Availability
##    <chr>       <chr>   <int> <chr>  <chr>      <int>       <int> <chr>           <dbl>    <dbl>    <dbl>    <dbl> <chr>     <chr>       
##  1 10246       Detect~  2015 Advan~ "Modular~   2262           6 http://imag~   133.     160.      200.    150.   Box       Retail - li~
##  2 10247       Ferris~  2015 Advan~ "Fairgro~   2464          10 http://imag~   150.     200.      230.    180.   Box       Retail - li~
##  3 10248       Ferrar~  2015 Advan~ "Vehicle~   1158          NA http://imag~    70.0    100.      120.     90.0  Box       LEGO exclus~
##  4 10249       Toy Sh~  2015 Advan~ "Winter ~    898          NA http://imag~    60.0     80.0      NA      70.0  Box       LEGO exclus~
##  5 10581       Ducks    2015 Duplo  "Forest ~     13           1 http://imag~     9.99     9.99     13.0     9.99 Box       Retail      
##  6 10582       Animals  2015 Duplo  "Forest ~     39           2 http://imag~    17.0     20.0      25.0    20.0  Box       Retail      
##  7 10583       Fishin~  2015 Duplo  "Forest ~     32           2 http://imag~    20.0     25.0      30.0    25.0  Box       Retail      
##  8 10584       Forest   2015 Duplo  "Forest ~    105           3 http://imag~    50.0     60.0      70.0    60.0  Box       Retail      
##  9 10585       Mom an~  2015 Duplo  ""            13           2 http://imag~     8.99     9.99     13.0     9.99 Box       Retail      
## 10 10586       Ice Cr~  2015 Duplo  ""            11           2 http://imag~    13.0     15.0      15.0    15.0  Box       Retail      
## # ... with 6,162 more rows
glimpse(legosets) #examine the structure of the dataframe- variables, observations, type of variables, etc.
## Rows: 6,172
## Columns: 14
## $ Item_Number  <chr> "10246", "10247", "10248", "10249", "10581", "10582", "10583", "10584", "10585", "10586", "10587", "10589", "1...
## $ Name         <chr> "Detective's Office", "Ferris Wheel", "Ferrari F40", "Toy Shop", "Ducks", "Animals", "Fishing Trip", "Forest",...
## $ Year         <int> 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 20...
## $ Theme        <chr> "Advanced Models", "Advanced Models", "Advanced Models", "Advanced Models", "Duplo", "Duplo", "Duplo", "Duplo"...
## $ Subtheme     <chr> "Modular Buildings", "Fairground", "Vehicles", "Winter Village", "Forest Animals", "Forest Animals", "Forest A...
## $ Pieces       <int> 2262, 2464, 1158, 898, 13, 39, 32, 105, 13, 11, 52, 13, 29, 19, 26, 105, 38, 87, 63, 24, 47, 29, 19, 38, 17, 2...
## $ Minifigures  <int> 6, 10, NA, NA, 1, 2, 2, 3, 2, 2, 3, 1, NA, NA, NA, NA, 1, 2, NA, NA, NA, NA, NA, NA, 1, NA, NA, NA, NA, NA, 2,...
## $ Image_URL    <chr> "http://images.brickset.com/sets/images/10246-1.jpg", "http://images.brickset.com/sets/images/10247-1.jpg", "h...
## $ GBP_MSRP     <dbl> 132.99, 149.99, 69.99, 59.99, 9.99, 16.99, 19.99, 49.99, 8.99, 12.99, 17.99, 12.99, 16.99, 12.99, 19.99, 44.99...
## $ USD_MSRP     <dbl> 159.99, 199.99, 99.99, 79.99, 9.99, 19.99, 24.99, 59.99, 9.99, 14.99, 19.99, 14.99, 19.99, 14.99, 24.99, 49.99...
## $ CAD_MSRP     <dbl> 200, 230, 120, NA, 13, 25, 30, 70, 13, 15, 25, 15, 25, 15, 30, 60, 30, 50, 45, 30, 40, 30, 20, 30, 15, 20, 25,...
## $ EUR_MSRP     <dbl> 149.99, 179.99, 89.99, 69.99, 9.99, 19.99, 24.99, 59.99, 9.99, 14.99, 19.99, 14.99, 19.99, 14.99, 24.99, 59.99...
## $ Packaging    <chr> "Box", "Box", "Box", "Box", "Box", "Box", "Box", "Box", "Box", "Box", "Box", "Box", "Box", "Box", "Box", "Box"...
## $ Availability <chr> "Retail - limited", "Retail - limited", "LEGO exclusive", "LEGO exclusive", "Retail", "Retail", "Retail", "Ret...

The dataframe has 14 variables (or columns) and 6,172 observations (rows). Besides the item number, year, theme/subtheme and the number of pieces and minifigures contained in each Lego box, we also have the recommeneded retail prices in GBP, USD, CAD, and EUR. While we are at it, let us have a quick look at how Lego prices (in GBP) have evolved over the years.

avg_price_per_year <- legosets %>% # create avg_price_year" by taking legosets, and then
  filter(!is.na(GBP_MSRP)) %>%    # filter out entries with no GBP prices, GBP_MSRP, and then
  group_by(Year) %>%              # group prices by year
  summarise(Price = mean(GBP_MSRP)) # create variable "Price" = yearly average of GBP_MSRP

ggplot(avg_price_per_year, 
       mapping = aes(x = Year, y = Price)) +  # time series plot: x=Year, y=Price
  geom_point(size = 0.5) +                    # simple scatterplot Y vs. X
  geom_line(size = 0.5) +                     # add the black line between points
  geom_smooth(se = FALSE) +                   # fit trend line,no error band around it "se = FALSE" 
  labs(x = "Year",   
       y = "Price (GBP)", 
       title = "Average price of LEGO sets",
       subtitle = "Amounts are reported in current GBP",
       caption = "Source: LEGO") +
  theme_bw()

There is a clear upward trend in average GBP prices.

And since we are talking about LEGOs, here is a fun application of creating LEGO mosaics from photos using R & the tidyverse

2.7 Updating packages

Every now and then the authors of packages release updated versions. The updated versions often add new functionality, fix bugs, and so on. It’s a good idea to update your packages periodically.

There’s an update.packages function, but it’s probably easier to stick with the RStudio tool. In the packages tab, click on the Update Packages button. This will bring up a window that looks like the one shown below:

In this window, each row refers to a package that needs to be updated. You can select which updates to install by checking the boxes on the left. If you feel lazy, click the Select All button, and then Install Updates. This might take a while to complete depending on how fast your internet connection is.

2.8 Updating R

About twice a year, a new version of R is released, and the features of all packages get changed to be compatible with the new version of R. The side effect of packages being compatible with the newest R version is that then you update to the newest version of R, you lose all the packages that you have downloaded and installed. Unfortuantely, you need to install the new versions of packages, even though they will typically behave just like the old ones.



This page last updated on: 2020-07-14