Chapter 10 Customizing Graphs
Graph defaults are fine for quick data exploration, but when you want to publish your results to a blog, paper, article or poster, you’ll probably want to customize the results. Customization can improve the clarity and attractiveness of a graph.
This chapter describes how to customize a graph’s axes, gridlines, colors, fonts, labels, and legend. It also describes how to add annotations (text and lines).
The x-axis and y-axis represent numeric, categorical, or date values. You can modify the default scales and labels with the functions below.
10.1.1 Quantitative axes
A quantitative axis is modified using the
breaks- a numeric vector of positions
limits- a numeric vector with the min and max for the scale
# customize numerical x and y axes library(ggplot2) ggplot(mpg, aes(x=displ, y=hwy)) + geom_point() + scale_x_continuous(breaks = seq(1, 7, 1), limits=c(1, 7)) + scale_y_continuous(breaks = seq(10, 45, 5), limits=c(10, 45))
10.1.1.1 Numeric formats
scales package provides a number of functions for formatting numeric labels. Some of the most useful are
Let’s demonstrate these functions with some synthetic data.
# create some data set.seed(1234) df <- data.frame(xaxis = rnorm(50, 100000, 50000), yaxis = runif(50, 0, 1), pointsize = rnorm(50, 1000, 1000)) library(ggplot2) # plot the axes and legend with formats ggplot(df, aes(x = xaxis, y = yaxis, size=pointsize)) + geom_point(color = "cornflowerblue", alpha = .6) + scale_x_continuous(label = scales::comma) + scale_y_continuous(label = scales::percent) + scale_size(range = c(1,10), # point size range label = scales::dollar)
To format currency values as euros, you can use
label = scales::dollar_format(prefix = "", suffix = "\u20ac").
10.1.2 Categorical axes
A categorical axis is modified using the
limits- a character vector (the levels of the quantitative variable in the desired order)
labels- a character vector of labels (optional labels for these levels)
library(ggplot2) # customize categorical x axis ggplot(mpg, aes(x = class)) + geom_bar(fill = "steelblue") + scale_x_discrete(limits = c("pickup", "suv", "minivan", "midsize", "compact", "subcompact", "2seater"), labels = c("Pickup\nTruck", "Sport Utility\nVehicle", "Minivan", "Mid-size", "Compact", "Subcompact", "2-Seater"))
10.1.3 Date axes
A date axis is modified using the
date_breaks- a string giving the distance between breaks like “2 weeks” or “10 years”
date_labels- A string giving the formatting specification for the labels
The table below gives the formatting specifications for date values.
|%d||day as a number (0-31)||01-31|
library(ggplot2) # customize date scale on x axis ggplot(economics, aes(x = date, y = unemploy)) + geom_line(color="darkgreen") + scale_x_date(date_breaks = "5 years", date_labels = "%b-%y")
Here is a help sheet for modifying scales developed from the online help.
The default colors in
ggplot2 graphs are functional, but often not as visually appealing as they can be. Happily this is easy to change.
Specific colors can be
- specified for points, lines, bars, areas, and text, or
- mapped to the levels of a variable in the dataset.
10.2.1 Specifying colors manually
To specify a color for points, lines, or text, use the
color = "colorname" option in the appropriate geom. To specify a color for bars and areas, use the
fill = "colorname" option.
geom_point(color = "blue")
geom_bar(fill = "steelblue")
Colors can be specified by name or hex code.
To assign colors to the levels of a variable, use the
scale_fill_manual functions. The former is used to specify the colors for points and lines, while the later is used for bars and areas.
Here is an example, using the
diamonds dataset that ships with
ggplot2. The dataset contains the prices and attributes of 54,000 round cut diamonds.
# specify fill color manually library(ggplot2) ggplot(diamonds, aes(x = cut, fill = clarity)) + geom_bar() + scale_fill_manual(values = c("darkred", "steelblue", "darkgreen", "gold", "brown", "purple", "grey", "khaki4"))
If you are aesthetically challenged like me, an alternative is to use a predefined palette.
10.2.2 Color palettes
There are many predefined color palettes available in R.
The most popular alternative palettes are probably the ColorBrewer palettes.
You can specify these palettes with the
# use an ColorBrewer fill palette ggplot(diamonds, aes(x = cut, fill = clarity)) + geom_bar() + scale_fill_brewer(palette = "Dark2")
direction = -1 to these functions reverses the order of the colors in a palette.
The viridis palette is another popular choice.
For continuous scales use
For discrete (categorical scales) use
# Use a viridis fill palette ggplot(diamonds, aes(x = cut, fill = clarity)) + geom_bar() + scale_fill_viridis_d()
10.2.2.3 Other palettes
If you want to explore all the palette options (or nearly all), take a look at the paletter package.
10.3 Points & Lines
ggplot2 graphs, the default point is a filled circle. To specify a different shape, use the
shape = # option in the
geom_point function. To map shapes to the levels of a categorical variable use the
shape = variablename option in the
geom_point(shape = 1)
aes(shape = sex))
Availabe shapes are given in the table below.
Shapes 21 through 26 provide for both a fill color and a border color.
The default line type is a solid line. To change the linetype, use the
linetype = # option in the
geom_line function. To map linetypes to the levels of a categorical variable use the
linetype = variablename option in the
geom_line(linetype = 1)
aes(linetype = sex))
Availabe linetypes are given in the table below.
R does not have great support for fonts, but with a bit of work, you can change the fonts that appear in your graphs. First you need to install and set-up the
# one time install install.packages("extrafont") library(extrafont) font_import() # see what fonts are now available fonts()
Apply the new font(s) using the
text option in the
# specify new font library(extrafont) ggplot(mpg, aes(x = displ, y=hwy)) + geom_point() + labs(title = "Diplacement by Highway Mileage", subtitle = "MPG dataset") + theme(text = element_text(size = 16, family = "Comic Sans MS"))
To learn more about customizing fonts, see Working with R, Cairo graphics, custom fonts, and ggplot.
ggplot2, legends are automatically created when variables are mapped to color, fill, linetype, shape, size, or alpha.
You have a great deal of control over the look and feel of these legends. Modifications are usually made through the
theme function and/or the
labs function. Here are some of the most sought after.
10.5.1 Legend location
The legend can appear anywhere in the graph. By default, it’s placed on the right. You can change the default with
theme(legend.position = position)
|“top”||above the plot area|
|“right”||right of the plot area|
|“bottom”||below the plot area|
|“left”||left of the plot area|
|c(x, y)||within the plot area. The x and y values must range between 0 and 1. c(0,0) represents (left, bottom) and c(1,1) represents (right, top).|
|“none”||suppress the legend|
For example, to place the legend at the top, use the following code.
# place legend on top ggplot(mpg, aes(x = displ, y=hwy, color = class)) + geom_point(size = 4) + labs(title = "Diplacement by Highway Mileage") + theme_minimal() + theme(legend.position = "top")
10.5.2 Legend title
You can change the legend title through the
labs function. Use
alpha to give new titles to the corresponding legends.
The alignment of the legend title is controlled through the
legend.title.align option in the
theme function. (0=left, 0.5=center, 1=right)
# change the default legend title ggplot(mpg, aes(x = displ, y=hwy, color = class)) + geom_point(size = 4) + labs(title = "Diplacement by Highway Mileage", color = "Automobile\nClass") + theme_minimal() + theme(legend.title.align=0.5)
See Hadley Wickam’s legend attributes for more details.
Labels are a key ingredient in rendering a graph understandable. They’re are added with the
labs function. Available options are given below.
|caption||caption (bottom right by default)|
|color||color legend title|
|fill||fill legend title|
|size||size legend title|
|linetype||linetype legend title|
|shape||shape legend title|
|alpha||transparency legend title|
|size||size legend title|
# add plot labels ggplot(mpg, aes(x = displ, y=hwy, color = class, shape = factor(year))) + geom_point(size = 3, alpha = .5) + labs(title = "Mileage by engine displacement", subtitle = "Data from 1999 and 2008", caption = "Source: EPA (http://fueleconomy.gov)", x = "Engine displacement (litres)", y = "Highway miles per gallon", color = "Car Class", shape = "Year") + theme_minimal()
Annotations are addition information added to a graph to highlight important points.
10.7.1 Adding text
There are two primary reasons to add text to a graph.
One is to identify the numeric qualities of a geom. For example, we may want to identify points with labels in a scatterplot, or label the heights of bars in a bar chart.
Another reason is to provide additional information. We may want to add notes about the data, point out outliers, etc.
10.7.1.1 Labeling values
Consider the following scatterplot, based on the car data in the mtcars dataset.
# basic scatterplot data(mtcars) ggplot(mtcars, aes(x = wt, y = mpg)) + geom_point()
Let’s label each point with the name of the car it represents.
# scatterplot with labels data(mtcars) ggplot(mtcars, aes(x = wt, y = mpg)) + geom_point() + geom_text(label = row.names(mtcars))
The overlapping labels make this chart difficult to read. There is a package called
ggrepel that can help us here.
# scatterplot with non-overlapping labels data(mtcars) library(ggrepel) ggplot(mtcars, aes(x = wt, y = mpg)) + geom_point() + geom_text_repel(label = row.names(mtcars), size=3)
Adding labels to bar charts is covered in the aptly named labeling bars section.
10.7.1.2 Adding additional information
We can place text anywhere on a graph using the
annotate function. The format is
annotate("text", x, y, label = "Some text", color = "colorname", size=textsize)
where x and y are the coordinates on which to place the text. The
size parameters are optional.
By default, the text will be centered. Use
vjust to change the alignment.
hjust0 = left justified, 0.5 = centered, and 1 = right centered.
vjust0 = above, 0.5 = centered, and 1 = below.
Continuing the previous example.
# scatterplot with explanatory text data(mtcars) library(ggrepel) txt <- paste("The relationship between car weight", "and mileage appears to be roughly linear", sep = "\n") ggplot(mtcars, aes(x = wt, y = mpg)) + geom_point(color = "red") + geom_text_repel(label = row.names(mtcars), size=3) + ggplot2::annotate("text", 6, 30, label=txt, color = "red", hjust = 1) + theme_bw()
See this blog post for more details.
10.7.2 Adding lines
Horizontal and vertical lines can be added using:
geom_hline(yintercept = a)
geom_vline(xintercept = b)
where a is a number on the y-axis and b is a number on the x-axis respectively. Other option include
# add annotation line and text label min_cty <- min(mpg$cty) mean_hwy <- mean(mpg$hwy) ggplot(mpg, aes(x = cty, y=hwy, color=drv)) + geom_point(size = 3) + geom_hline(yintercept = mean_hwy, color = "darkred", linetype = "dashed") + ggplot2::annotate("text", min_cty, mean_hwy + 1, label = "Mean", color = "darkred") + labs(title = "Mileage by drive type", x = "City miles per gallon", y = "Highway miles per gallon", color = "Drive")
We could add a vertical line for the mean city miles per gallon as well. In any case, always label annotation lines in some way. Otherwise the reader will not know what they mean.
10.7.3 Highlighting a single group
Sometimes you want to highlight a single group in your graph. The
gghighlight function in the
gghighlight package is designed for this.
Here is an example with a scatterplot.
# highlight a set of points library(ggplot2) library(gghighlight) ggplot(mpg, aes(x = cty, y = hwy)) + geom_point(color = "red", size=2) + gghighlight(class == "midsize")
Below is an example with a bar chart.
# highlight a single bar library(gghighlight) ggplot(mpg, aes(x = class)) + geom_bar(fill = "red") + gghighlight(class == "midsize")
There is nothing here that could not be done with base graphics, but it is more convenient.
ggplot2 themes control the appearance of all non-data related components of a plot. You can change the look and feel of a graph by altering the elements of its theme.
10.8.1 Altering theme elements
theme function is used to modify individual components of a theme.
The parameters of the
theme function are described in a cheatsheet developed from the online help.
Consider the following graph. It shows the number of male and female faculty by rank and discipline at a particular university in 2008-2009. The data come from the Salaries for Professors dataset.
# create graph data(Salaries, package = "carData") p <- ggplot(Salaries, aes(x = rank, fill = sex)) + geom_bar() + facet_wrap(~discipline) + labs(title = "Academic Rank by Gender and Discipline", x = "Rank", y = "Frequency", fill = "Gender") p
Let’s make some changes to the theme.
- Change label text from black to navy blue
- Change the panel background color from grey to white
- Add solid grey lines for major y-axis grid lines
- Add dashed grey lines for minor y-axis grid lines
- Eliminate x-axis grid lines
- Change the strip background color to white with a grey border
Using the cheat sheet gives us
p + theme(text = element_text(color = "navy"), panel.background = element_rect(fill = "white"), panel.grid.major.y = element_line(color = "grey"), panel.grid.minor.y = element_line(color = "grey", linetype = "dashed"), panel.grid.major.x = element_blank(), panel.grid.minor.x = element_blank(), strip.background = element_rect(fill = "white", color="grey"))
Wow, this looks pretty awful, but you get the idea.
If you would like to create your own theme using a GUI, take a look at
ggThemeAssist. After you install the package, a new menu item will appear under Addins in RStudio.
Highlight the code that creates your graph, then choose the
ggThemeAssist option from the Addins drop-down menu. You can change many of the features of your theme using point-and-click. When you’re done, the
theme code will be appended to your graph code.
10.8.2 Pre-packaged themes
I’m not a very good artist (just look at the last example), so I often look for pre-packaged themes that can be applied to my graphs. There are many available.
Some come with
ggplot2. These include theme_classic, theme_dark, theme_gray, theme_grey, theme_light theme_linedraw, theme_minimal, and theme_void. We’ve used theme_minimal often in this book. Others are available through add-on packages.
ggthemes package come with 19 themes.
|theme_economist||ggplot color theme based on the Economist|
|theme_economist_white||ggplot color theme based on the Economist|
|theme_excel||ggplot color theme based on old Excel plots|
|theme_few||Theme based on Few’s “Practical Rules for Using Color in Charts”|
|theme_fivethirtyeight||Theme inspired by fivethirtyeight.com plots|
|theme_gdocs||Theme with Google Docs Chart defaults|
|theme_hc||Highcharts JS theme|
|theme_igray||Inverse gray theme|
|theme_map||Clean theme for maps|
|theme_pander||A ggplot theme originated from the pander package|
|theme_par||Theme which takes its values from the current ‘base’ graphics parameter values in ‘par’.|
|theme_solarized||ggplot color themes based on the Solarized palette|
|theme_solarized_2||ggplot color themes based on the Solarized palette|
|theme_solid||Theme with nothing other than a background color|
|theme_stata||Themes based on Stata graph schemes|
|theme_tufte||Tufte Maximal Data, Minimal Ink Theme|
|theme_wsj||Wall Street Journal theme|
To demonstrate their use, we’ll first create and save a graph.
# create basic plot library(ggplot2) p <- ggplot(mpg, aes(x = displ, y=hwy, color = class)) + geom_point(size = 3, alpha = .5) + labs(title = "Mileage by engine displacement", subtitle = "Data from 1999 and 2008", caption = "Source: EPA (http://fueleconomy.gov)", x = "Engine displacement (litres)", y = "Highway miles per gallon", color = "Car Class") # display graph p
Now let’s apply some themes.
# add economist theme library(ggthemes) p + theme_economist()
# add fivethirtyeight theme p + theme_fivethirtyeight()
# add wsj theme p + theme_wsj(base_size=8)
By default, the font size for the wsj theme is usually too large. Changing the
base_size option can help.
Each theme also comes with scales for colors and fills. In the next example, both the
few theme and colors are used.
# add few theme p + theme_few() + scale_color_few()
Try out different themes and scales to find one that you like.
hrbrthemes package is focused on typography-centric themes. The results are charts that tend to have a clean look.
Continuing the example plot from above
# add few theme library(hrbrthemes) p + theme_ipsum()
See the hrbrthemes homepage for additional examples.
ggthemer package offers a wide range of themes (17 as of this printing).
The package is not available on CRAN and must be installed from GitHub.
# one time install install.packages("devtools") devtools::install_github('cttobin/ggthemr')
The functions work a bit differently. Use the
ggthemr("themename") function to set future graphs to a given theme. Use
ggthemr_reset() to return future graphs to the
ggplot2 default theme.
Current themes include flat, flat dark, camoflauge, chalk, copper, dust, earth, fresh, grape, grass, greyscale, light, lilac, pale, sea, sky, and solarized.
# set graphs to the flat dark theme library(ggthemr) ggthemr("flat dark") p
I would not actually use this theme for this particular graph. It is difficult to distinguish colors. Which green represents compact cars and which represents subcompact cars?
Select a theme that best conveys the graph’s information to your audience.