Predicting medical expenses

insurance

Format

A data frame with 1338 rows and 7 variables:

age

integer. age of primary beneficiary.

sex

character. insurance contractor gender, female, male.

bmi

double. Body mass index, providing an understanding of body, weights that are relatively high or low relative to height, objective index of body weight (kg / m ^ 2) using the ratio of height to weight, ideally 18.5 to 24.9.

children

integer. Number of children covered by health insurance / Number of dependents.

smoker

character. Smoking (yes, no)

region

character. the beneficiary's residential area in the US, northeast, southeast, southwest, northwest.

charges

double. Individual medical costs billed by health insurance (in dollars).

Source

From Machine Learning in R by Brett Lantz. Data downloaded from GitHub.

Examples

summary(insurance)
#>       age            sex                 bmi           children    
#>  Min.   :18.00   Length:1338        Min.   :15.96   Min.   :0.000  
#>  1st Qu.:27.00   Class :character   1st Qu.:26.30   1st Qu.:0.000  
#>  Median :39.00   Mode  :character   Median :30.40   Median :1.000  
#>  Mean   :39.21                      Mean   :30.66   Mean   :1.095  
#>  3rd Qu.:51.00                      3rd Qu.:34.69   3rd Qu.:2.000  
#>  Max.   :64.00                      Max.   :53.13   Max.   :5.000  
#>     smoker             region             charges     
#>  Length:1338        Length:1338        Min.   : 1122  
#>  Class :character   Class :character   1st Qu.: 4740  
#>  Mode  :character   Mode  :character   Median : 9382  
#>                                        Mean   :13270  
#>                                        3rd Qu.:16640  
#>                                        Max.   :63770