Predicting no-show medical appointments

appointments

Format

A data frame with 110527 rows and 14 variables:

PatientId

double. Identification of a patient.

AppointmentID

double. dentification of each appointment.

Gender

factor. Male, Female.

ScheduledDay

datatime. The day and time of the actual appointment, when they have to visit the doctor.

AppointmentDay

double. The day someone called or registered the appointment, this is before appointment of course.

Age

double. Age of the patient.

Neighbourhood

character. Where the appointment takes place.

Scholarship

integer. 0=FALSE, 1=TRUE. Scholarship is a social welfare program providing financial aid to poor Brazilian families.

Hypertension

integer. 0=FALSE, 1=TRUE.

Diabetes

integer. 0=FALSE, 1=TRUE.

Alcoholism

integer. 0=FALSE, 1=TRUE.

Handcap

integer. 0=FALSE, 1=TRUE.

SMS_received

integer. 0=FALSE, 1=TRUE. 1 or more messages sent to the patient.

No_show

factor. Yes, No.

Source

Joni Hoppen, Kaggle Medical Appointment No Shows https://www.kaggle.com/joniarroba/noshowappointments.

Details

This Kaggle competition was designed to challenge participants to predict office no-shows. It is also a good dataset to practice date and time manipulation.

Examples

summary(appointments)
#>    PatientId         AppointmentID        Gender     
#>  Min.   :3.920e+04   Min.   :5030230   Male  :71840  
#>  1st Qu.:4.173e+12   1st Qu.:5640286   Female:38687  
#>  Median :3.173e+13   Median :5680573                 
#>  Mean   :1.475e+14   Mean   :5675305                 
#>  3rd Qu.:9.439e+13   3rd Qu.:5725524                 
#>  Max.   :1.000e+15   Max.   :5790484                 
#>   ScheduledDay                 AppointmentDay                     Age        
#>  Min.   :2015-11-10 07:13:56   Min.   :2016-04-29 00:00:00   Min.   : -1.00  
#>  1st Qu.:2016-04-29 10:27:01   1st Qu.:2016-05-09 00:00:00   1st Qu.: 18.00  
#>  Median :2016-05-10 12:13:17   Median :2016-05-18 00:00:00   Median : 37.00  
#>  Mean   :2016-05-09 07:49:15   Mean   :2016-05-19 00:57:50   Mean   : 37.09  
#>  3rd Qu.:2016-05-20 11:18:37   3rd Qu.:2016-05-31 00:00:00   3rd Qu.: 55.00  
#>  Max.   :2016-06-08 20:07:23   Max.   :2016-06-08 00:00:00   Max.   :115.00  
#>  Neighbourhood       Scholarship         Diabetes         Alcoholism    
#>  Length:110527      Min.   :0.00000   Min.   :0.00000   Min.   :0.0000  
#>  Class :character   1st Qu.:0.00000   1st Qu.:0.00000   1st Qu.:0.0000  
#>  Mode  :character   Median :0.00000   Median :0.00000   Median :0.0000  
#>                     Mean   :0.09827   Mean   :0.07186   Mean   :0.0304  
#>                     3rd Qu.:0.00000   3rd Qu.:0.00000   3rd Qu.:0.0000  
#>                     Max.   :1.00000   Max.   :1.00000   Max.   :1.0000  
#>     Handcap         SMS_received   No_show      Hypertension   
#>  Min.   :0.00000   Min.   :0.000   No :88208   Min.   :0.0000  
#>  1st Qu.:0.00000   1st Qu.:0.000   Yes:22319   1st Qu.:0.0000  
#>  Median :0.00000   Median :0.000               Median :0.0000  
#>  Mean   :0.02225   Mean   :0.321               Mean   :0.1972  
#>  3rd Qu.:0.00000   3rd Qu.:1.000               3rd Qu.:0.0000  
#>  Max.   :4.00000   Max.   :1.000               Max.   :1.0000