Chapter 6 Maps

R provides a myriad of methods for creating both static and interactive maps containing statistical information. This section focuses on the use of ggmap and choroplethr.

6.1 Dot density maps

Dot density maps use points on a map to explore spatial relationships.

The Houston crime dataset contains the date, time, and address of six types of criminal offenses reported between January and August 2010. The longitude and latitude of each offence was added using geocode function, which takes an address and returns coordinates using the Google Maps API.

We’ll use this dataset to plot the locations of rape reports.

library(ggmap)

# subset the data
library(dplyr)
rapes <- filter(crime, offense == "rape") %>%
  select(date, offense, address, lon, lat)

# view data
head(rapes)
##       date offense            address       lon      lat
## 1 1/1/2010    rape   5950 glenmont dr -95.48498 29.72007
## 2 1/1/2010    rape    2350 sperber ln -95.34817 29.75505
## 3 1/1/2010    rape   5850 mackinaw rd -95.47353 29.60021
## 4 1/1/2010    rape 5850 southwest fwy -95.48174 29.72603
## 5 1/2/2010    rape  7550 corporate dr -95.55224 29.69836
## 6 1/2/2010    rape   1150 fidelity st -95.25535 29.74147

Let’s set up the map.

  1. Find the center coordinates for Houston, TX
# using geocode function returns 
# lon=-95.3698, lat=29.76043
houston_center <- geocode("Houston, TX")
  1. Get the background map image.
  • Specify a zoom factor from 3 (continent) to 21 (building). The default is 10 (city).
  • Specify a map type. Types include terrain, terrain-background, satellite, roadmap, hybrid, watercolor, and toner.
# get Houston map
houston_map <- get_map(houston_center, 
                       zoom = 13, 
                       maptype = "roadmap")
ggmap(houston_map)
Houston map

Figure 6.1: Houston map

  1. Add crime locations to the map.
# add incident locations
ggmap(houston_map, 
      base_layer = ggplot(data = rapes,
                          aes(x=lon, y = lat))) +
  geom_point(color = "red",
             size = 3,
             alpha = 0.5)
Crime locations

Figure 6.2: Crime locations

  1. Clean up the plot and add labels.
# remove long and lat numbers and add titles
ggmap(houston_map, 
      base_layer = ggplot(aes(x=lon, y = lat), 
                          data = rapes)) +
  geom_point(color = "red",
             size = 3,
             alpha = 0.5) +
  theme_void() +
  labs(title = "Location of reported rapes",
       subtitle = "Houston Jan - Aug 2010",
       caption = "source: http://www.houstontx.gov/police/cs/")
Crime locations with titles, and without longitude and latitude

Figure 6.3: Crime locations with titles, and without longitude and latitude

There seems to be a concentration of rape reports in midtown.

To learn more about ggmap, see ggmap: Spatial Visualization with ggplot2.

6.2 Choropleth maps

Choropleth maps use color or shading on predefined areas to indicate average values of a numeric variable in that area. In this section we’ll use the choroplethr package to create maps that display information by country, US state, and US county.

6.2.1 Data by country

Let’s create a world map and color the countries by life expectancy using the 2007 gapminder data.

The choroplethr package has numerous functions that simplify the task of creating a choropleth map. To plot the life expectancy data, we’ll use the country_choropleth function.

The function requires that the data frame to be plotted has a column named region and a column named value. Additionally, the entries in the region column must exactly match how the entries are named in the region column of the dataset country.map from the choroplethrMaps package.

# view the first 12 region names in country.map
data(country.map, package = "choroplethrMaps")
head(unique(country.map$region), 12)
##  [1] "afghanistan" "angola"      "azerbaijan"  "moldova"     "madagascar" 
##  [6] "mexico"      "macedonia"   "mali"        "myanmar"     "montenegro" 
## [11] "mongolia"    "mozambique"

Note that the region entries are all lower case.

To continue, we need to make some edits to our gapminder dataset. Specifically, we need to

  1. select the 2007 data
  2. rename the country variable to region
  3. rename the lifeExp variable to value
  4. recode region values to lower case
  5. recode some region values to match the region values in the country.map data frame. The recode function in the dplyr package take the form recode(variable, oldvalue1 = newvalue1, oldvalue2 = newvalue2, ...)
# prepare dataset
data(gapminder, package = "gapminder")
plotdata <- gapminder %>%
  filter(year == 2007) %>%
  rename(region = country,
         value = lifeExp) %>%
  mutate(region = tolower(region)) %>%
  mutate(region = recode(region,
                          "united states"    = "united states of america",
                          "congo, dem. rep." = "democratic republic of the congo",
                          "congo, rep."      = "republic of congo",
                          "korea, dem. rep." = "south korea",
                          "korea. rep."      = "north korea",
                          "tanzania"         = "united republic of tanzania",
                          "serbia"           = "republic of serbia",
                          "slovak republic"  = "slovakia",
                          "yemen, rep."      = "yemen"))

Now lets create the map.

library(choroplethr)
country_choropleth(plotdata)
Choropleth map of life expectancy

Figure 6.4: Choropleth map of life expectancy

choroplethr functions return ggplot2 graphs. Let’s make it a bit more attractive by modifying the code with additional ggplot2 functions.

country_choropleth(plotdata,
                   num_colors=9) +
  scale_fill_brewer(palette="YlOrRd") +
  labs(title = "Life expectancy by country",
       subtitle = "Gapminder 2007 data",
       caption = "source: https://www.gapminder.org",
       fill = "Years")
Choropleth map of life expectancy with labels and a better color scheme

Figure 6.5: Choropleth map of life expectancy with labels and a better color scheme

6.2.2 Data by US state

For US data, the choroplethr package provides functions for creating maps by county, state, zip code, and census tract. Additionally, map regions can be labeled.

Let’s plot US states by Mexcian American popultion, using the 2010 Census.

To plot the population data, we’ll use the state_choropleth function. The function requires that the data frame to be plotted has a column named region to represent state, and a column named value (the quantity to be plotted). Additionally, the entries in the region column must exactly match how the entries are named in the region column of the dataset state.map from the choroplethrMaps package.

The zoom = continental_us_states option will create a map that excludes Hawaii and Alaska.

library(ggplot2)
library(choroplethr)
data(continental_us_states)

# input the data
library(readr)
mex_am <- read_tsv("mexican_american.csv")

# prepare the data
mex_am$region <- tolower(mex_am$state)
mex_am$value <- mex_am$percent

# create the map
state_choropleth(mex_am, 
                 num_colors=9,
                 zoom = continental_us_states) +
  scale_fill_brewer(palette="YlOrBr") +
  labs(title = "Mexican American Population",
       subtitle = "2010 US Census",
       caption = "source: https://en.wikipedia.org/wiki/List_of_U.S._states_by_Hispanic_and_Latino_population",
       fill = "Percent") 
Choropleth map of US States

Figure 6.6: Choropleth map of US States

6.2.3 Data by US county

Finally, let’s plot data by US counties. We’ll plot the violent crime rate per 1000 individuals for Connecticut counties in 2012. Data come from the FBI Uniform Crime Statistics.

We’ll use the county_choropleth function. Again, the function requires that the data frame to be plotted has a column named region and a column named value.

Additionally, the entries in the region column must be numeric codes and exactly match how the entries are given in the region column of the dataset county.map from the choroplethrMaps package.

Our dataset has country names (e.g. fairfield). However, we need region codes (e.g., 9001). We can use the county.regions dataset to lookup the region code for each county name.

Additionally, we’ll use the option reference_map = TRUE to add a reference map from Google Maps.

library(ggplot2)
library(choroplethr)
library(dplyr)

# enter violent crime rates by county
crimes_ct <- data.frame(
  county = c("fairfield", "hartford", 
             "litchfield", "middlesex", 
             "new haven", "new london", 
             "tolland", "windham"),
  value = c(3.00, 3.32, 
            1.02, 1.24, 
            4.13, 4.61, 
            0.16, 1.60)
)

crimes_ct
##       county value
## 1  fairfield  3.00
## 2   hartford  3.32
## 3 litchfield  1.02
## 4  middlesex  1.24
## 5  new haven  4.13
## 6 new london  4.61
## 7    tolland  0.16
## 8    windham  1.60
# obtain region codes for connecticut
data(county.regions, 
     package = "choroplethrMaps")
region <- county.regions %>%
  filter(state.name == "connecticut")

region
##   region county.fips.character county.name  state.name
## 1   9001                 09001   fairfield connecticut
## 2   9003                 09003    hartford connecticut
## 3   9005                 09005  litchfield connecticut
## 4   9007                 09007   middlesex connecticut
## 5   9009                 09009   new haven connecticut
## 6   9011                 09011  new london connecticut
## 7   9013                 09013     tolland connecticut
## 8   9015                 09015     windham connecticut
##   state.fips.character state.abb
## 1                   09        CT
## 2                   09        CT
## 3                   09        CT
## 4                   09        CT
## 5                   09        CT
## 6                   09        CT
## 7                   09        CT
## 8                   09        CT
# join crime data to region code data
plotdata <- inner_join(crimes_ct, 
                       region, 
                       by=c("county" = "county.name"))
plotdata
##       county value region county.fips.character  state.name
## 1  fairfield  3.00   9001                 09001 connecticut
## 2   hartford  3.32   9003                 09003 connecticut
## 3 litchfield  1.02   9005                 09005 connecticut
## 4  middlesex  1.24   9007                 09007 connecticut
## 5  new haven  4.13   9009                 09009 connecticut
## 6 new london  4.61   9011                 09011 connecticut
## 7    tolland  0.16   9013                 09013 connecticut
## 8    windham  1.60   9015                 09015 connecticut
##   state.fips.character state.abb
## 1                   09        CT
## 2                   09        CT
## 3                   09        CT
## 4                   09        CT
## 5                   09        CT
## 6                   09        CT
## 7                   09        CT
## 8                   09        CT
# create choropleth map
county_choropleth(plotdata, 
                  state_zoom = "connecticut",
                  reference_map = TRUE,
                  num_colors = 8) +
  scale_fill_brewer(palette="YlOrRd") +
  labs(title = "Connecticut Violent Crime Rates",
       subtitle = "FBI 2012 data",
       caption = "source: https://ucr.fbi.gov",
       fill = "Violent Crime\n Rate Per 1000")
Choropleth map of violent crimes by Connecticut counties

Figure 6.7: Choropleth map of violent crimes by Connecticut counties

See the choroplethr help for more details.

R provides many ways to create chropleth maps. The choroplethr package is just one route. The tmap package provides another. A google search is sure to find others.