After a fresh installation of R, you have its core engine which is also known as base R. You can do a lot of statistical analysis in base R without any additional packages. However, the beauty of the R ecosystem is the ability to install and use an R package, namely a collection of functions, data sets, and other R objects that other users have written to and are all grouped together under a common name.
Before we get started, there’s a critical distinction between having a package installed on your computer, and having a package loaded in R. The simple rule is that you install once, but you must load the package everytime you need to use it. When you install R on your computer, you don’t get all of the R packages: only a few come bundled with the base R installation. So, after a fresh installation, there are few packages installed on your computer, and thousands more that are not installed. In order for R to be able to use one of the first few installed packages ir comes with, that package must also be loaded. So remember,
This relationship is illustrated in the image below:
In the lower right hand panel in Rstudio there is a tab labelled Packages
. Click on the tab, and you’ll see a list of packages that looks something like this:
You are unlikely to have exacty the same list of packages, as we have installed quite a few packages. Every row in this panel corresponds to a different package, and every column is a useful piece of information about that package, namely:
tidyverse
You should install some essential packages now that you will need in the Data Analytics courses. Copy and paste the following command into the R console to do this
This will take a while as tidyverse
is a collection of packages and R will have to install all dependencies.
tidyverse
if you have a MacUnfortunately, installing the tidyverse
isn’t quite always a straight-forward task with the current version of macOS 10.14, Mojave which was released on September 24, 2018.
To solve issues thay may arise with missing xml2
library, please do the following:
Be careful as you do need two dashes before the install
. A software update popup window should appear that will ask if you want to install command line developer tools. Click on “Install” (you don’t need to click on “Get Xcode”)
/usr/bin/ruby -e "$(curl -fsSL.)
, paste it into Terminal, and press enter.This installs Homebrew
, which is special software that lets you install Unix-y programs from the terminal.
libxml2
tidyverse
Once the tidyverse
collection of packages installs and you get back to the R prompt >
, you can install a series of packages that will be useful later in the course. You can copy/paste the code below; please note that this will take quite a while, so grab a coffee.
# install these packages as well
list_of_packages <- c(
"moderndive", # https://www.moderndive.com/
"DT", # Allows us to handle Data Tables and manipulate data faster
"unvotes", # How countries have voted in UN resolutions
"gridExtra", # Miscellaneous Functions for "Grid" Graphics
"GGally", # Allows us to create a correlations/scatterplots matrix
"tidyquant", # Download and manipulate financial data
"wbstats", # Download World Bank Data
"eurostat", # Download data from Eurostat
"fpp2", # Time Series and Forecasting fucntions, with data too
"car", # Applied Regression- allows to calculate VIF, Variance Inflation Factor
"gapminder", # Data on life expectancy, GDP/capita, and population by country and year
"nycflights13", # Data on all domestic flights through NYCs 3 airports (JFK, EWR, LGA) in 2013
"fivethirtyeight", #Data used in articles that appeared in the fivethirtyeight.com website
"corrr", # correlation in R
"plotly", # interactive visualizations
"sf", # tidy geo-computing
"cowplot", # ggplot multiple figures addon
"coefplot", # plot coefficients from fitted models
"interplot", # plot effects of variables in interaction terms
"scales", # scale functions for visualisations
"ggridges", # ridgeline plots in ggplot2
"skimr", # nice dataframe summaries
"leaflet", # interactive maps
"ggrepel", # geoms for ggplot2 to repel overlapping text labels
"viridis", # Colour Maps
"rvest", # scrape webpages
"usethis", # automation of package and project setup
"devtools", # installing packages from Github
"tidytext", # text mining
"here", # finding your files
"mosaic" # summary stats, using mosaic::favstats()
)
install.packages(list_of_packages, dependencies=TRUE, repos = "https://cran.rstudio.com/")
Most of the time the packages that you’ll want to install have been made available on CRAN, the Comprehensive R Archive Network, so you use the install.packages("package_name")
function. Sometimes people write packages that are not submitted to CRAN, and sometimes you might want to try out a package that is currently under development. In these situations, people who write packages will often make them available on GitHub. We can install packages directly from Github, using the devtools package.
The first thing you need to do is install devtools, which is easy because that package is available on CRAN and hopefully you installed it with all packages listed earlier. If not,
Once you install devtools, you must explicitly say to R you will be using it by typing library(devtools)
. Then, you can use the install_github
command to install a package directly from a GitHub repository. For example, there’s an R data package featuring every Lego set from 1970 to 2015 put together by Sean Kross.
R fetches and installs the package from Github, and we now have the new lego package to play with. To verify that everything worked properly, let’s load the lego
package and look at its legosets
dataframe:
library(lego) #load the lego package into the computer's memory
legosets #view the legosets dataframe
## # A tibble: 6,172 x 14
## Item_Number Name Year Theme Subtheme Pieces Minifigures Image_URL GBP_MSRP USD_MSRP CAD_MSRP EUR_MSRP Packaging Availability
## <chr> <chr> <int> <chr> <chr> <int> <int> <chr> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
## 1 10246 Detect~ 2015 Advan~ "Modular~ 2262 6 http://imag~ 133. 160. 200. 150. Box Retail - li~
## 2 10247 Ferris~ 2015 Advan~ "Fairgro~ 2464 10 http://imag~ 150. 200. 230. 180. Box Retail - li~
## 3 10248 Ferrar~ 2015 Advan~ "Vehicle~ 1158 NA http://imag~ 70.0 100. 120. 90.0 Box LEGO exclus~
## 4 10249 Toy Sh~ 2015 Advan~ "Winter ~ 898 NA http://imag~ 60.0 80.0 NA 70.0 Box LEGO exclus~
## 5 10581 Ducks 2015 Duplo "Forest ~ 13 1 http://imag~ 9.99 9.99 13.0 9.99 Box Retail
## 6 10582 Animals 2015 Duplo "Forest ~ 39 2 http://imag~ 17.0 20.0 25.0 20.0 Box Retail
## 7 10583 Fishin~ 2015 Duplo "Forest ~ 32 2 http://imag~ 20.0 25.0 30.0 25.0 Box Retail
## 8 10584 Forest 2015 Duplo "Forest ~ 105 3 http://imag~ 50.0 60.0 70.0 60.0 Box Retail
## 9 10585 Mom an~ 2015 Duplo "" 13 2 http://imag~ 8.99 9.99 13.0 9.99 Box Retail
## 10 10586 Ice Cr~ 2015 Duplo "" 11 2 http://imag~ 13.0 15.0 15.0 15.0 Box Retail
## # ... with 6,162 more rows
glimpse(legosets) #examine the structure of the dataframe- variables, observations, type of variables, etc.
## Rows: 6,172
## Columns: 14
## $ Item_Number <chr> "10246", "10247", "10248", "10249", "10581", "10582", "10583", "10584", "10585", "10586", "10587", "10589", "1...
## $ Name <chr> "Detective's Office", "Ferris Wheel", "Ferrari F40", "Toy Shop", "Ducks", "Animals", "Fishing Trip", "Forest",...
## $ Year <int> 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 20...
## $ Theme <chr> "Advanced Models", "Advanced Models", "Advanced Models", "Advanced Models", "Duplo", "Duplo", "Duplo", "Duplo"...
## $ Subtheme <chr> "Modular Buildings", "Fairground", "Vehicles", "Winter Village", "Forest Animals", "Forest Animals", "Forest A...
## $ Pieces <int> 2262, 2464, 1158, 898, 13, 39, 32, 105, 13, 11, 52, 13, 29, 19, 26, 105, 38, 87, 63, 24, 47, 29, 19, 38, 17, 2...
## $ Minifigures <int> 6, 10, NA, NA, 1, 2, 2, 3, 2, 2, 3, 1, NA, NA, NA, NA, 1, 2, NA, NA, NA, NA, NA, NA, 1, NA, NA, NA, NA, NA, 2,...
## $ Image_URL <chr> "http://images.brickset.com/sets/images/10246-1.jpg", "http://images.brickset.com/sets/images/10247-1.jpg", "h...
## $ GBP_MSRP <dbl> 132.99, 149.99, 69.99, 59.99, 9.99, 16.99, 19.99, 49.99, 8.99, 12.99, 17.99, 12.99, 16.99, 12.99, 19.99, 44.99...
## $ USD_MSRP <dbl> 159.99, 199.99, 99.99, 79.99, 9.99, 19.99, 24.99, 59.99, 9.99, 14.99, 19.99, 14.99, 19.99, 14.99, 24.99, 49.99...
## $ CAD_MSRP <dbl> 200, 230, 120, NA, 13, 25, 30, 70, 13, 15, 25, 15, 25, 15, 30, 60, 30, 50, 45, 30, 40, 30, 20, 30, 15, 20, 25,...
## $ EUR_MSRP <dbl> 149.99, 179.99, 89.99, 69.99, 9.99, 19.99, 24.99, 59.99, 9.99, 14.99, 19.99, 14.99, 19.99, 14.99, 24.99, 59.99...
## $ Packaging <chr> "Box", "Box", "Box", "Box", "Box", "Box", "Box", "Box", "Box", "Box", "Box", "Box", "Box", "Box", "Box", "Box"...
## $ Availability <chr> "Retail - limited", "Retail - limited", "LEGO exclusive", "LEGO exclusive", "Retail", "Retail", "Retail", "Ret...
The dataframe has 14 variables (or columns) and 6,172 observations (rows). Besides the item number, year, theme/subtheme and the number of pieces and minifigures contained in each Lego box, we also have the recommeneded retail prices in GBP, USD, CAD, and EUR. While we are at it, let us have a quick look at how Lego prices (in GBP) have evolved over the years.
avg_price_per_year <- legosets %>% # create avg_price_year" by taking legosets, and then
filter(!is.na(GBP_MSRP)) %>% # filter out entries with no GBP prices, GBP_MSRP, and then
group_by(Year) %>% # group prices by year
summarise(Price = mean(GBP_MSRP)) # create variable "Price" = yearly average of GBP_MSRP
ggplot(avg_price_per_year,
mapping = aes(x = Year, y = Price)) + # time series plot: x=Year, y=Price
geom_point(size = 0.5) + # simple scatterplot Y vs. X
geom_line(size = 0.5) + # add the black line between points
geom_smooth(se = FALSE) + # fit trend line,no error band around it "se = FALSE"
labs(x = "Year",
y = "Price (GBP)",
title = "Average price of LEGO sets",
subtitle = "Amounts are reported in current GBP",
caption = "Source: LEGO") +
theme_bw()
There is a clear upward trend in average GBP prices.
And since we are talking about LEGOs, here is a fun application of creating LEGO mosaics from photos using R & the tidyverse
Every now and then the authors of packages release updated versions. The updated versions often add new functionality, fix bugs, and so on. It’s a good idea to update your packages periodically.
There’s an update.packages
function, but it’s probably easier to stick with the RStudio tool. In the packages tab, click on the Update Packages
button. This will bring up a window that looks like the one shown below:
In this window, each row refers to a package that needs to be updated. You can select which updates to install by checking the boxes on the left. If you feel lazy, click the Select All button, and then Install Updates. This might take a while to complete depending on how fast your internet connection is.
About twice a year, a new version of R is released, and the features of all packages get changed to be compatible with the new version of R. The side effect of packages being compatible with the newest R version is that then you update to the newest version of R, you lose all the packages that you have downloaded and installed. Unfortuantely, you need to install the new versions of packages, even though they will typically behave just like the old ones.
This page last updated on: 2020-07-14