Getting summary statistics for a quantitative variable is a very common task in data analysis. Unfortunately, R makes it surprisingly difficult.
The qstats
function is an attempt to rectify the situation by making it simple to get any number of descriptive statistics for a numeric variable and to break these statistics down by the levels of one or more categorical variables (groups).
The general format is
qstats(data, variable, grouping variables, statistics, other options)
Note that variable names do not have to be quoted.
By default the sample size, mean, and standard deviation are provided. Let’s take a look at fuel efficiencies for 11,914 automobiles in the cardata
data frame.
# simple summary statistics
qstats(cardata, highway_mpg)
#> n mean sd
#> 1 11914 26.64 8.86
# summary statistics by vehicle_size
qstats(cardata, highway_mpg, vehicle_size)
#> vehicle_size n mean sd
#> 1 Compact 4764 28.94 9.58
#> 2 Large 2777 22.42 7.37
#> 3 Midsize 4373 26.80 7.91
# summary statistics by vehicle_size and drive type
qstats(cardata, highway_mpg, vehicle_size, driven_wheels)
#> vehicle_size driven_wheels n mean sd
#> 1 Compact all wheel drive 646 26.88 4.77
#> 2 Compact four wheel drive 407 20.79 2.90
#> 3 Compact front wheel drive 2491 33.26 9.89
#> 4 Compact rear wheel drive 1220 23.94 7.50
#> 5 Large all wheel drive 438 26.00 12.84
#> 6 Large four wheel drive 737 19.57 2.66
#> 7 Large front wheel drive 389 25.78 2.46
#> 8 Large rear wheel drive 1213 21.78 6.73
#> 9 Midsize all wheel drive 1269 25.83 4.41
#> 10 Midsize four wheel drive 259 18.85 2.51
#> 11 Midsize front wheel drive 1907 30.24 9.46
#> 12 Midsize rear wheel drive 938 23.32 5.16
You can supply a statistics argument with the “stats” parameter. You can pass a single statistic, or multiple statistics as a vector of names.
# single statistic
qstats(cardata, highway_mpg, vehicle_size, stats = "median")
#> vehicle_size median
#> 1 Compact 28
#> 2 Large 22
#> 3 Midsize 26
# multiple statistics
qstats(cardata, highway_mpg, vehicle_size,
stats = c("median", "min", "max"))
#> vehicle_size median min max
#> 1 Compact 28 12 111
#> 2 Large 22 13 107
#> 3 Midsize 26 12 354
User-defined functions can also be used as a statistics. The only requirement is that the function returns a single number.
#custom statistics
p25 <- function(x) quantile(x, probs=.25)
p75 <- function(x) quantile(x, probs=.75)
#calling the built in and custom statistics
qstats(cardata, highway_mpg, vehicle_size,
stats = c("min", "p25", "p75", "max"))
#> vehicle_size min p25 p75 max
#> 1 Compact 12 24 33 111
#> 2 Large 13 19 25 107
#> 3 Midsize 12 23 31 354