The crosstab function calculates and prints a two-way frequency table.
Given a data frame, a row variable, a column variable, and a type (frequencies, cell percents, row percents, or column percents) the function returns a table with
na.rm = FALSE
)total = FALSE
)chisquare = TRUE
)Tables are printed with 2 decimal places for percents (modifiable using digits=#
). Variables are coerced to factors if necessary. Adding plot=TRUE
produces a ggplot2
graph instead of a table.
In the examples below, the number of car cylinders (cyl) is cross-tabulated with the number of gears (gear) for 32 automobiles in the cars74 data frame.
By default, the crosstab function reports frequency counts for each combination of the two categorical variables. The most common car type has 3 gears and 8 cylinders.
Cell percents add up to 100% overall all the cells in the table. 25% of all cars in the data frame have 4 gears and 4 cylinders.
Row percents sum to 100% for each row of the table. 86% of 8 cylinder cars have 3 gears.
Column percents sum to 100% for each column of the table. Only 7% of 3 gear cars have 4 cylinders.
You can include a test that the two categorical variables are independent, by adding the option chisquare = TRUE
.
crosstab(cars74, cyl, gear, type = "colpercent", chisquare=TRUE)
#> gear
#> cyl gears3 gears4 gears5
#> cyl4 6.67% 66.67% 40.00%
#> cyl6 13.33% 33.33% 20.00%
#> cyl8 80.00% 0.00% 40.00%
#> Total 100.00% 100.00% 100.00%
#>
#> Chi-square = 18.04, df = 4, p = 0.0012
crosstab(cars74, cyl, gear, type = "colpercent", plot=TRUE,
chisquare = TRUE)