3 Packages
3.1 The Essentials
To get ready for analysis we’ll need to install packages. We’ll grab packages from 3 sources and. We’ll need a special installer for 2 of these sources:
For Bioconductor packages we’ll need BiocManager
install.packages("BiocManager")
and for GitHub packages we’ll need devtools
install.packages("devtools")
Now lets get most of the packages we’ll need installed. First lets get the CRAN packages
# Install package from CRAN if not already installed
if(!require(tidyverse)) install.packages("tidyverse")
if(!require(here)) install.packages("here")
if(!require(readxl)) install.packages("readxl")
if(!require(cowplot)) install.packages("cowplot")
if(!require(foreach)) install.packages("foreach")
if(!require(reshape2)) install.packages("reshape2")
if(!require(RColorBrewer)) install.packages("RColorBrewer")
if(!require(rstudioapi)) install.packages("rstudioapi")
if(!require(fBasics)) install.packages("fBasics")
if(!require(scales)) install.packages("scales")
if(!require(grDevices)) install.packages("grDevicesr")
if(!require(ggtext)) install.packages("ggtext")
if(!require(limma)) install.packages("limma")
if(!require(viridis)) install.packages("viridis")
if(!require(colorspace)) install.packages("colorspace")
if(!require(pryr)) install.packages("uwot")
then the Bioconductor packages
# So we can install the BioConductor packages we'll need
if(!require(flowCore)) BiocManager::install("flowCore")
# The other packages we need are:
Bioconductor_Packages <- c("ggcyto", "FlowSOM", "ConsensusClusterPlus", "flowCut", "limma")
BiocManager::install(Bioconductor_Packages)
# You get get asked to update other packages, choose 'a' for all.
# If you are asked to install packages from source choose 'yes'.
and finally the devtools packages.
library(devtools)
if(!require(premessa)) install_github("ParkerICI/premessa", force = TRUE)
3.2 A Deeper Dive
3.2.1 Installing and Managing Packages
3.2.1.1 What is a package?
A package is a collection of files and scripts that either extends the functionality of R (think adding tSNE or UMAP)so that we can do more or wraps various functions of R into easier to use commands for various types of analysis (think cytofkit or the tidyverse) that allow users to do more, more simply and more quickly.
3.2.2 What forms do packages take?
Packages are distributed in different forms. R might occasionally ask which form of the package you want e.g. “Do you want to install the package from Source?”. Here. we’ll explain what that means and define the other forms a package can be distributed.
3.2.2.1 Source
A source package is just a directory of files with a specific structure. It includes particular components, such as a DESCRIPTION file, an R/ directory containing .R files etc. Here’s what the R package for CATALYST looks like on my system:
Source files are not “ready-to-work” and must be compiled by your computer into a working format. This often requires additional tools installed on your computer. For Windows machines, you will need to install Rtools in addition and MacOS will require xcode for instance. This was covered in the Chapter 1. Because we have set up our environment to be ready for source files we CAN install from Source when necessary.
3.2.2.2 Bundled
A bundled package is a package that’s been compressed into a single file. By convention (from Linux), package bundles in R use the extension .tar.gz and are sometimes referred to as “source tarballs”. This means that multiple files have been reduced to a single file (.tar) and then compressed using gzip (.gz). While a bundle is not that useful on its own, it’s a platform-agnostic, transportation-friendly intermediary between a source package and an installed package.
3.2.2.3 Binary
If you want to distribute your package to an R user who doesn’t have package development tools, you’ll need to provide a binary package. Like a package bundle, a binary package is a single file. Unlike a bundled package, a binary package is platform specific and there are two basic flavors: Windows and macOS. (Linux users are generally required to have the tools necessary to install from .tar.gz files.) Binary packages for macOS are stored as .tgz, whereas Windows binary packages end in .zip.
3.2.2.4 Installed
An installed package is a binary package that’s been decompressed into a package library. The installed package is ready to be used but is not necessarily active without being loaded into memory. There are several ways to install a package, mainly because there are several package repositories that provide packages. We will use 3 different repositories for our analyses. The repositories are different based on the specialty of the packages, how thoroughly they are tested and how well they are maintained.
3.2.2.4.1 CRAN
The Comprehensive R Archive Network (CRAN) is the the central repository for R source code. It’s where we downloaded the latest version of R from and hosts the most popular general purpose packages. Examples are things like ggplot2 for graphing and knitr for document generation. Packages from CRAN must conform to a specific format before they are uploaded and are tested daily on several machines to make sure that everything works correctly. These packages have become central to the R user experience and are generally the most stable and easy to use and the easiest to install.
3.2.2.4.1.1 How to install a package from CRAN
We can install packages from CRAN with the install.packages()
command. This will download and install the package.
install.packages("viridisLite")
#> Installing package into 'C:/Users/dcarragher/AppData/Local/R/win-library/4.3'
#> (as 'lib' is unspecified)
#> package 'viridisLite' successfully unpacked and MD5 sums checked
#>
#> The downloaded binary packages are in
#> C:\Users\dcarragher\AppData\Local\Temp\RtmpKuSlpz\downloaded_packages
Note that the package name must be between quotation marks (““) or you’ll get an error:
install.packages(viridis)
If you want to install multiple CRAN packages at once you can concatenate them with c()
install.packages(c("viridis", "viridisLite", "scales"), repos="http://cran.us.r-project.org")
#> package 'viridis' successfully unpacked and MD5 sums checked
#> package 'viridisLite' successfully unpacked and MD5 sums checked
#> package 'scales' successfully unpacked and MD5 sums checked
#>
#> The downloaded binary packages are in
#> C:\Users\dcarragher\AppData\Local\Temp\RtmpKuSlpz\downloaded_packages
install.packages
is very streamlined and will work like this for most packages but it’s good to be aware of some options that are available to you.
3.2.2.4.1.2 Where are packages saved?
Lets talk about where the packages are saved. If you have never set up R before, R will choose to create a library(lib) somewhere for you. You can discover where R has put the libraries with
.libPaths()
#> [1] "C:/Users/dcarragher/AppData/Local/R/win-library/4.3"
#> [2] "C:/Program Files/R/R-4.3.1/library"
As you can see, on Windows machines we end up with 2 libraries:
- A System library where the base packages are installed alongside the R installation
- A User library for the currently logged in user, stored in there windows user area.
Most of the time, packages are installed to the appropriate library but if you prefer to install a package to the other library you can choose this with the libs
option.
remove.packages("viridisLite") # uninstall and remove package from computer
install.packages("viridisLite", lib = "C:/Users/dcarragher/AppData/Local/R/win-library/4.2") # re-install to specific library
remove.packages("viridisLite") # remove again...
install.packages("viridisLite", lib = "C:/Program Files/R/R-4.2.0/library") # ... and install into different library
# Note: if you have problems with read/write capability to the Program Files library try running RStudio in Administrator mode
3.2.2.4.1.3 Where can we install packages from?
CRAN is mirrored all over the world so it is possible to get packages regardless of your locale or any internet outages. We can check which repo is set as default with
options("repos")
#> $repos
#> CRAN
#> "http://cran.rstudio.com/"
As you can see we’re using the RStudio repo, which tends to be up to date. You can see a list of the available CRAN mirror repos here. Using this we can if we want (or need to) download CRAN packages from, for example, Toronto.
library(viridis) # Load package into memory
detach("package:viridisLite", unload=TRUE) # Unload package
remove.packages("viridisLite") # Remove package from computer
install.packages("viridisLite", repos = "https://cran.utstat.utoronto.ca/") # Re-install from different repo
Sometimes we might not have access to CRAN, either due to internet outages or overzealous firewall protocols. It is possible to install packages from local files (remember the .tar.gz files discussed above). We can get these files from CRAN. For this we will need to set repos = "NULL"
and tell R where the local file is. You can download the files from CRAN. For windows we will use the .zip
library(viridisLite)
library(viridis)
detach("package:viridis", unload=TRUE)
detach("package:viridisLite", unload=TRUE) # Unload package or we won't be able to remove
remove.packages("viridisLite") # Remove package from computer
install.packages("C:/Users/dcarragher/Downloads/viridisLite_0.4.0.zip", repos = NULL, type = "win.binary") # Install zipped package from wherever you've saved it - Note the `/` forward slashes used in R rather than `\` backslashes.
The packages for local installation can be downloaded from CRAN:
It’s useful to know this as some strong firewalls will not allow installation of packages directly from CRAN.
Let’s install the packages we’ll need from CRAN for this analysis (if you haven’t already done so)
install.packages(c("here","tidyverse", "readxl", "cowplot", "foreach", "reshape2", "RColorBrewer", "rstudioapi", "scales", "ggtext", "viridis", "colorspace, :uwot"))
3.2.2.4.2 Bioconductor
Bioconductor is a central repository for R packages associated with analysis of bio-sciences data.
The mission of the Bioconductor project is to develop, support, and disseminate >free open source software that facilitates rigorous and reproducible analysis >of data from current and emerging biological assays
Similarly to CRAN, Bioconductor tests all of the packages uploaded there to ensure that they are complete and free from errors. Furthermore, Bioconductor pioneered its own uses data objects like the single cell experiment (sce) that are commonly used across a range of data analyses from genomics to cytometry. In fact you might often find that objects from one package can be used with other packages.
3.2.2.4.2.1 How to install a package from Bioconductor
For Bioconductor packages, we first need to download the Bioconductor package manager from CRAN
install.packages("BiocManager")
#> Installing package into 'C:/Users/dcarragher/AppData/Local/R/win-library/4.3'
#> (as 'lib' is unspecified)
#> package 'BiocManager' successfully unpacked and MD5 sums checked
#>
#> The downloaded binary packages are in
#> C:\Users\dcarragher\AppData\Local\Temp\RtmpKuSlpz\downloaded_packages
Once BiocManager
is downloaded we can use it to install BioConductor packages with the install
function.
BiocManager::install("flowCore")
As you can see, like install.packages
we need encapsulate the package name in ""
marks. Of note the BiocManager::
means that we specifically want to use the BiocManager version of install rather than for instance the devtools version; devtools::install
. With common commands like install, it’s often useful to dictate the package before the function to ensure you use the correct one.
Now lets install the other Bioconductor packages we’ll need. We can use the c()
concatenate function to install all of them at once.
BiocManager::install(c("ggcyto", "FlowSOM", "ConsensusClusterPlus", "flowCut", "limma"), force = TRUE)
I have had to add the , force = TRUE
option at the end of the package list because I already have the packages installed. If you don’t have these packages you can omit this step.
3.2.2.4.3 GitHub
The advantage of CRAN and Bioconductor packages is that they have been thoroughly tested and are actively maintained. However, maintaining packages requires a lot of effort and time. Given that these open source packages are often written by small research labs they often don’t have the time or money to keep maintaining a package indefinitely. Therefore, packages either are never available from CRAN or BioConductor or eventually drop off those repos as they begin to fail the rigorous tests. A really good example of this is cytofkit
, which is a wonderful piece of analysis software that used to be on BioConductor. Eventually, it dropped off BioConductor because it no longer passed the Bioconductor testing protocol and the lab didn’t have the time to make it pass. Instead of just disappearing, the authors made it available through a code-sharing repository, namely GitHub. GitHub is an online code-saving and sharing repository that anyone can get an account on. This makes it useful for sharing things like packages. This also means you should be careful about which packages you download and use as they could have come from literally anywhere. I recommend only using packages (or other code) referenced in a journal article.
3.2.2.4.3.1 How to install a package from GitHub
For GitHub packages we will need to download the devtools
packages, which allows us to import GitHub packages.
install.packages("devtools")
Once we have installed devtools we can get the github packages we’ll need.
devtools::install_github("ParkerICI/premessa")
The github packages need 2 parts; the name of the github account holder and the name of the package. Both are encapsulates in ""
marks and separated by a /
.
3.2.2.5 In memory
Packages in R must not only be installed, but R must be told that we want to use them for our current analysis. We’ve cheated a bit so far by using package_name::
to call packages without loading them. If we are going to extensively use a package it’s better to load it into memory. In memory means the packages are actively working in R and available to be used for your analysis.
3.2.2.5.1 library()
We primarily call packages to memory by the library()
command.
library(flowCore)
Note that there is no need for ""
marks around the name of the package being loaded but you can use them if you want.
library("ggcyto")
#> Loading required package: ggplot2
#> Loading required package: ncdfFlow
#> Loading required package: BH
#> Loading required package: flowWorkspace
#> As part of improvements to flowWorkspace, some behavior of
#> GatingSet objects has changed. For details, please read the section
#> titled "The cytoframe and cytoset classes" in the package vignette:
#>
#> vignette("flowWorkspace-Introduction", "flowWorkspace")
By convention (and convenience) we typically omit the ""
.
You should use library()
to load all of your packages because it generates an obvious error message if you’ve misspelled the package or the package doesn’t exist. Importantly an error in R stops everything that is happening, which makes problems very obvious.
install.packages("fhlowCore") # note the typo
#> Installing package into 'C:/Users/dcarragher/AppData/Local/R/win-library/4.3'
#> (as 'lib' is unspecified)
#> Warning: package 'fhlowCore' is not available for this version of R
#>
#> A version of this package for your version of R might be available elsewhere,
#> see the ideas at
#> https://cran.r-project.org/doc/manuals/r-patched/R-admin.html#Installing-packages
A rule of programming suggests it’s better to fail early (if you have to fail at all!). It makes it very clear that a problem has occurred, which you can then correct - in this case fix the typo to call the correct package into memory.
3.2.2.5.2 require()
You may see that require()
is used to call packages to memory. In fact, if you check the “Essentials” section of this chapter you’ll see that I used it. So why would we use require()
when library()
is superior? The simple answer is that library()
gives an error and require()
gives a warning. The error stops any subsequent commands proceeding.
library(fhlowCore) # typo package name
#> Error in library(fhlowCore): there is no package called 'fhlowCore'
a <- 25+2
a *3
#> [1] 81
whereas require()
require(fhlowCore) #typo package name
a <- 25+2
a *3
allows subsequent commands to proceed (because it only gives a warning, not an error). This allows us to use require check if a package is present, load it if it is, or install it if not:
# Load package to memory if installed, otherwise install from cran
if(!require(viridis)) install.packages("viridis")
If we uninstall the above package you’ll see that it gets re-installed as necessary
# Load package to memory if installed, otherwise install from cran
remove.packages("viridis")
if(!require(viridis)) install.packages("viridis")
You’ll see this format used to install packages because it offers this flexibility of call to memory OR install.
3.2.3 Maintaining your packages
Remember, You can manage and maintain your packages using the Packages
tab in the Files
pane. From the command line we can check which packages are installed with:
Note: This can take some time if you have lots of packages.
library(tidyverse) # load tidyverse
installed.packages() %>% # create list of all installed packages
as.tibble() %>% # turn list into table
head(15) %>% #look at the first 15 entries in the table
kableExtra::kable() # create pretty table
We can also turn the data into a table
You’ll get a list of all packages, where they are installed and what version you’re using.
3.2.3.1 Updating Packages
As different versions of R are released, package developers update them to make sure they are up to date. You can update packages with
update.packages() # you may have to several yes/no popups
3.2.3.2 Removing Packages
Should you need to remove a package you can do so with
remove.packages("viridis")
If you are having trouble updating a package it might be better to remove the package and the re-install it (remember to restart your R session after uninstalling the package).
3.2.3.3 How to learn about the packages we’ve isntalled.
Often you’ll get a recommendation to install a package without knowing much about it. How can you learn about a packages use and functions?
Firstly, we’ve already seen how
?viridis
#> No documentation for 'viridis' in specified packages and libraries:
#> you could try '??viridis'
will bring up some relevant description in the Help
tab of the Files
pane.
Secondly, you can use
packageDescription("viridis")
#> Package: viridis
#> Type: Package
#> Title: Colorblind-Friendly Color Maps for R
#> Version: 0.6.4
#> Date: 2023-07-19
#> Authors@R: c( person("Simon", "Garnier", email =
#> "garnier@njit.edu", role = c("aut", "cre")),
#> person("Noam", "Ross", email =
#> "noam.ross@gmail.com", role = c("ctb", "cph")),
#> person("Bob", "Rudis", email = "bob@rud.is",
#> role = c("ctb", "cph")), person("Marco",
#> "Sciaini", email = "sciaini.marco@gmail.com",
#> role = c("ctb", "cph")), person("Antônio
#> Pedro", "Camargo", role = c("ctb", "cph")),
#> person("Cédric", "Scherer", email =
#> "scherer@izw-berlin.de", role = c("ctb",
#> "cph")) )
#> Maintainer: Simon Garnier <garnier@njit.edu>
#> Description: Color maps designed to improve graph
#> readability for readers with common forms of
#> color blindness and/or color vision deficiency.
#> The color maps are also perceptually-uniform,
#> both in regular form and also when converted to
#> black-and-white for printing. This package also
#> contains 'ggplot2' bindings for discrete and
#> continuous color and fill scales. A lean
#> version of the package called 'viridisLite'
#> that does not include the 'ggplot2' bindings
#> can be found at
#> <https://cran.r-project.org/package=viridisLite>.
#> License: MIT + file LICENSE
#> Encoding: UTF-8
#> Depends: R (>= 2.10), viridisLite (>= 0.4.0)
#> Imports: ggplot2 (>= 1.0.1), gridExtra
#> Suggests: hexbin (>= 1.27.0), scales, MASS, knitr,
#> dichromat, colorspace, httr, mapproj, vdiffr,
#> svglite (>= 1.2.0), testthat, covr, rmarkdown,
#> maps, terra
#> LazyData: true
#> VignetteBuilder: knitr
#> URL: https://sjmgarnier.github.io/viridis/,
#> https://github.com/sjmgarnier/viridis/
#> BugReports:
#> https://github.com/sjmgarnier/viridis/issues
#> RoxygenNote: 7.2.3
#> NeedsCompilation: no
#> Packaged: 2023-07-19 13:52:50 UTC; simon
#> Author: Simon Garnier [aut, cre], Noam Ross [ctb,
#> cph], Bob Rudis [ctb, cph], Marco Sciaini [ctb,
#> cph], Antônio Pedro Camargo [ctb, cph], Cédric
#> Scherer [ctb, cph]
#> Repository: CRAN
#> Date/Publication: 2023-07-22 12:50:02 UTC
#> Built: R 4.3.1; ; 2023-10-15 02:51:07 UTC; windows
#>
#> -- File: C:/Users/dcarragher/AppData/Local/R/win-library/4.3/viridis/Meta/package.rds
provides some more info about what the package does, who wrote it and where it is saved.
Finally, there are vignetters, accessed via
browseVignettes("viridis")
which will open up any vignettes written for a package in your default web browser. Vignettes provide an overview of the package, what the important functions do as well as some example code using those functions to get you started. They are a good way to understand what the package author intended for the package and to see how they have used the included code.
3.2.4 Conclusion
- Install CRAN packages with
install.packages("")
(remember the quotation marks) - Call packages to memory with
library()
quotation marks optional -
require()
should not be used because it won’t throw an error - you won’t know if it worked or not!-
if(!require(viridis)) install.packages("viridis")
is still used in examples and demos
-
-
install.packages("BiocManager)
andBiocManager::install("")
are required to install Bioconductor packages. -
install.packages("devtools")
anddevtools::install_github("")
are required to install GitHub packages - Maintain your packages using the
Packages
tab of RStudio (or useupdate.packages()
) - Get package help with
-
?packageName
(for general package help) -
packageDescription("packageName")
(for author info and a brief description) -
browseVignettes("packageName")
for function overviews and code demonstrations straight from the author
-