Modern Data Viz with R

Rob Kabacoff

Goals

  • Understand the ggplot2   “grammar of graphics”
  • Survey popular graph types and discuss when to use each
  • Create new, innovative graphs
  • Learn how to customize your work

Schedule - Morning

Time Activity
09:05 - 10:00 AM Session I -    Intro to R and ggplot2
10:00 - 10:15 AM Break
10:15 - 11:15 AM Session II -    Univariate graphs
11:15 - 11:30 AM Break
11:30 AM - 12:30 PM Session III -    Bivariate and Multivariate graphs
12:30 - 1:30 PM Lunch Break +Talk
SCASA 2022 Regional Statistics Data Visualization Poster Competition

Schedule - Afternoon

Time Activity
1:30 - 2:30 PM Session IV -   Time series, interactive graphs, and maps
2:30 - 2:40PM Break
2:40 - 3:40 PM Session V -   Case Study
3:40 - 4:00 PM Book Raffle and Wrap-up

Before we jump into visualizing data

Importing data

Packages

  • readr - read text files
  • readxl - read Excel files
  • haven - read SAS, IBM SPSS, and Stata files
  • rvest - scrape web pages

Access databases with RMySQL, ROracle, RPostgreSQL, RSQLite

Use load() function to open R data files.

Use data() to open package data files.

Before we jump into visualizing data

Data management with the dplyr package

Verbs

  • select - select variables (columns)
  • filter - select observations (rows)
  • mutate - create new or transform existing variables
  • summarize - summarize data (counts, means, etc.)

    group_by - carry out actions within subgroups

A case study

Current Population Survey (United States Census)

  • How does hourly wage relate to years of experience?

  • Is this relationship constant when considering other factors?

Import data

library(ggplot2)
data(CPS85, package="mosaicData")

Explore data

  • Start with a simple graph
  • adding new features as needed



We'll use ggplot2

    and add other packages as we go…

ggplot2 components

Setting up the graph

  • ggplot
  • geoms
  • grouping
  • facets

Customizing the graph

  • scales
  • labels
  • themes

ggplot

ggplot( data,  aes(…) )

  • data - a data frame
  • aes(…) - maps variable(s) to visual aspects of the graph

geoms - geometric objects

  • geom_point
  • geom_bar
  • geom_line
  • geom_smooth
  • geom_histogram
  • geom_density
  • geom_boxplot
  • geom_abline
  • geom_vline
  • geom_hline
  • geom_ribbon
  • geom_errorbar
  • geom_text
  • and many more …

Grouping

Plotting more than one group of data in a graph using:

  • color
  • fill
  • shape
  • alpha
  • size
  • linetype

legends are automatically created and can be customized.

Color

  • color - refers to the color of points, lines, borders, text
  • fill - refers to color of areas, bars, regions

list of colors

http://www.stat.columbia.edu/~tzheng/files/Rcolor.pdf

Shape

Linetypes

Scales

Control how variables are mapped to visual aspects of graph.

  • scale_x_continuous
  • scale_x_discrete
  • scale_x_date
  • scale_color_manual
  • scale_color_gradient
  • scale_color_brewer

  • many more …

Facets

  • Tufte small multiples
  • Create multiple versions of the plot for each level of one or more categorical variables
  • creates a “mosaic” of plots in one graph

Labels

  • Title
  • Subtitle
  • Caption
  • Legend title

Themes

Modifies the non-data components of the plot

  • fonts
  • gridlines
  • plot background and margins
  • appearance and placement of legend
  • appearance of axes
  • appearance of facet strips

  • can be canned (e.g. theme_minimal ) or custom