Multivariate Graphs

Rob Kabacoff
Saturday, April 23, 2022

Bivariate Graphs

Categorical vs. Categorical

  • bar chart (stacked, grouped)

Quantitative vs. Quantitative

  • scatter plot
  • line plot

Categorical vs. Quantitative

  • box and violin plots
  • mean/se plots
  • ridgeline and grouped kernel density plots
  • strip and beewsarm plots
  • Cleveland plots

Multivariate extensions

  • grouping
  • faceting

  • bubble chart

  • correlation matrix

  • 3D scatter plot

  • mosaic chart

Bar Chart

Two categorical variables

ggplot(data, aes(x =, color = )) +
  geom_bar(position =)

Position

  • stack
  • dodge
  • fill

Scatter plot

Two quantitative variables

ggplot(data, aes(x =, y = )) +
  geom_point()

Common options

  • color
  • alpha
  • size

Scatter plot - with fit line

ggplot(data, aes(x =, y = )) +
  geom_point() +
  geom_smooth(method = , formula = , se =)

For geom_smooth

  • method: “lm”, “loess”, “gam”
  • formula examples: y ~ x, y ~ poly(x, 2)
  • se: with confidence intervals TRUE FALSE

Box plots

Box plots

ggplot(data, aes(x =, y = )) +
  geom_boxplot()

Common options

  • fill
  • outlier.color
  • outlier.size
  • notched

Violin plots

ggplot(data, aes(x =, y =)) +
  geom_violin()

Common options

  • color
  • fill

Mean/Standard Error plot

library(dplyr)
plotdata <- data %>%
  group_by(x) %>%
  summarize(n = n(),
            mean = mean(y),
            se = sd(y) / sqrt(y))

ggplot(plotdata, 
       aes(x = rank, y = mean, group = 1)) +
  geom_point() +
  geom_line() +
  geom_errorbar(aes(ymin = mean - se, ymax = mean + se), 
                width = )

Grouped Kernel density plot

ggplot(data, aes(x =, y =, fill = )) +
  geom_density(alpha = )

Common options

  • alpha

Ridgeline plot

library(ggridges)
ggplot(data, 
       aes(x = , y =, fill =)) +
  geom_density_ridges() + 
  theme_ridges() +
  theme(legend.position = "none")
  • can handle many categories

Strip plot

ggplot(data, aes(x =, y = )) +
  geom_jitter()
  • can be combined with color mapping and boxplots
  • can also be used in scatter plots to avoid overprinting

Beeswarm plot

library(ggbeeswarm)
ggplot(data, aes(x = , y =)) +
  geom_quasirandom() 

Common options

  • alpha
  • size
  • can be combined with color mapping

Cleveland plot

Categorical variable with many levels vs. quantiative variable

ggplot(plotdata, aes(x=quantvar, y=reorder(catvar, quantvar))) +
  geom_point()

Bubble plot

3 quantitative variables

ggplot(data, aes(x = , y = , size = )) +
  geom_point()

Common options

  • shape (e.g. 21 with fill and color)
  • alpha
+ scale_size_continuous(range = c(1, 10))

Correlation matrix plot

r <- cor(df, use="complete.obs")
library(ggcorrplot)
ggcorrplot(r, 
           hc.order = TRUE, # order variables into clusters? 
           type = "lower",  # upper, lower, or full matrix?
           lab = TRUE)      # include numeric labels?

Mosaic plot

2 or more categorical variables

library(vcd)
mosaic(~var1 + var2 + ..., data, shade=TRUE)

hard to read after 4 variables

3d Scatter plot

static

library(scatterplot3d)
with(data, scatterplot3d(x=, y=, z=))

interactive

library(car)
with(data, scatter3d(x=, y=, z=))