Chapter 1 Introduction

If you are reading this book, you probably already appreciate the importance of visualizing data. It is an essential component of any data analysis. While we generally accept the old adage that “a picture is worth a thousand words”, it is worthwhile taking a moment to consider why.

Humans are remarkably capable of discerning patterns in visual data. This allows us to discover relationships, identify unusual or mistaken values, determine trends and differences, and with some effort, understand the relationships among several variables at once.

Additionally, data visualizations tend to have a greater cognitive and emotional impact than either text descriptions or tables of numbers. This makes them a key ingredient in both storytelling and the crafting of persuasive arguments.

Because graphs are so compelling, researchers and data scientists have an ethical obligation to create visualizations that fairly and accurately reflect the information contained in the data. The goal of this book is provide you with the tools to both select and create graphs that present data as clearly, understandably, and accurately (honestly) as possible.

The R platform (R Core Team 2023) provides one of the most comprehensive set of tools for accomplishing these goals. The software is open source, freely available, runs on almost any platform, is highly customizable, and is supported by a massive world-wide user base. The tools described in this book should allow you to create almost any type of data visualization desired.

Currently, the most popular approach to creating graphs in R uses the ggplot2 package (Wickham et al. 2023). Based on a Grammar of Graphics (Wilkinson and Wills 2005), the ggplot2 package provides a coherent and extensible system for data visualization and is the central approach used in this book. Since its release, a number of additional packages have been have been developed to enhance and expand the types of graphs that can easily be created with ggplot2. Many of these are explored in later chapters.

1.1 How to use this book

I hope that this book will provide you with comprehensive overview of data visualization. However, you don’t need to read this book from start to finish in order to start building effective graphs. Feel free to jump to the section that you need and then explore others that you find interesting.

Graphs are organized by

  • the number of variables to be plotted
  • the type of variables to be plotted
  • the purpose of the visualization
Chapter Description
Ch 2 provided a quick overview of how to get your data into R and how to prepare it for analysis.
Ch 3 provides an overview of the ggplot2 package.
Ch 4 describes graphs for visualizing the distribution of a single categorical (e.g. race) or quantitative (e.g. income) variable.
Ch 5 describes graphs that display the relationship between two variables.
Ch 6 describes graphs that display the relationships among 3 or more variables. It is helpful to read chapters 4 and 5 before this chapter.
Ch 7 provides a brief introduction to displaying data geographically.
Ch 8 describes graphs that display change over time.
Ch 9 describes graphs that can help you interpret the results of statistical models.
Ch 10 covers graphs that do not fit neatly elsewhere (every book needs a miscellaneous chapter).
Ch 11 describes how to customize the look and feel of your graphs. If you are going to share your graphs with others, be sure to check it out.
Ch 12 covers how to save your graphs. Different formats are optimized for different purposes.
Ch 13 provides an introduction to interactive graphics.
Ch 14 gives advice on creating effective graphs and where to go to learn more. It’s worth a look.
The Appendices describe each of the datasets used in this book, and provides a short blurb about the author and the Wesleyan QAC.

There is no one right graph for displaying data. Check out the examples, and see which type best fits your needs.

1.2 Prequisites

It’s assumed that you have some experience with the R language and that you have already installed R and RStudio. If not, here are two excellent resources for getting started:

Either of these resources will help you familiarize yourself with R quickly.

1.3 Setup

In order to create the graphs in this book, you’ll need to install a number of optional R packages. Most of these packages are hosted on the Comprehensive R Archive Network (CRAN). To install all of these CRAN packages, run the following code in the RStudio console window.

CRAN_pkgs <- c("ggplot2", "dplyr", "tidyr", "mosaicData", 
               "carData", "VIM", "scales", "treemapify", 
               "gapminder","sf", "tidygeocoder",  "mapview", 
               "ggmap", "osmdata", "choroplethr", 
               "choroplethrMaps", "lubridate", "CGPfunctions", 
               "ggcorrplot", "visreg", "gcookbook", "forcats", 
               "survival", "survminer", "car", "rgl", 
               "ggalluvial", "ggridges", "GGally", "superheat", 
               "waterfalls", "factoextra","networkD3", 
               "ggthemes", "patchwork", "hrbrthemes", "ggpol", 
               "quantmod", "gghighlight", "leaflet", "ggiraph", 
               "rbokeh", "ggalt")
install.packages(CRAN_pkgs)

Alternatively, you can install a given package the first time it is needed.

For example, if you execute

library(gapminder)

and get the message

Error in library(gapminder) : there is no package called ‘gapminder’

you know that the package has never been installed. Simply execute

install.packages("gapminder")

once and

library(gapminder)

will work from that point on.

A few specialized packages used later in the book are only hosted on GitHub. You can install them using the install_github function in the remotes package. First install the remotes package from CRAN.

install.packages("remotes")

Then run the following code to install the remaining packages.

github_pkgs <- c("rkabacoff/ggpie", "hrbrmstr/waffle",
                 "ricardo-bion/ggradar", "ramnathv/rCharts",
                 "Mikata-Project/ggthemr")
remotes::install_github(github_pkgs, dependencies = TRUE)

Although it may seem like a lot, these packages should install fairly quickly. And again, you can install them individually as needed.

At this point, you should be ready to go. Let’s get started!

References

R Core Team. 2023. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.
Wickham, Hadley, Winston Chang, Lionel Henry, Thomas Lin Pedersen, Kohske Takahashi, Claus Wilke, Kara Woo, Hiroaki Yutani, and Dewey Dunnington. 2023. Ggplot2: Create Elegant Data Visualisations Using the Grammar of Graphics. https://CRAN.R-project.org/package=ggplot2.
Wilkinson, Leland, and Graham Wills. 2005. The Grammar of Graphics. Book. 2nd ed. Statistics and Computing. New York: Springer.