Breast cancer data

breast

Format

A data frame with 286 rows and 10 variables:

age

factor. 20-29, 30-39, 40-49, 50-59, 60-69, 70-79.

menopause

factor. lt40, ge40, premeno.

tumor.size

factor. 0-4, 5-9, 10-14, 15-19, 20-24, 25-29, 30-34, 35-39, 40-44, 45-49, 50-54.

inv.nodes

factor. 0-2, 3-5, 6-8, 9-11, 12-14, 15-17, 18+.

node.caps

factor. yes, no.

deg.malig

factor. 1, 2, 3. Higher numbers indicate greater malignancy.

breast

factor. left, right.

breast.quad

factor. left-up, left-low, right-up, right-low, central.

irradiate

factor. yes, no.

recurrence

factor. yes, no.

Source

This breast cancer domain was obtained from the University Medical Centre, Institute of Oncology, Ljubljana, Yugoslavia. Thanks go to M. Zwitter and M. Soklic for providing the data. Downloaded from OpenML.

Note

recurrence is the response or outcome variable.

Examples

summary(breast)
#>     age       menopause     tumor.size inv.nodes   node.caps  deg.malig
#>  20-29: 1   ge40   :129   30-34  :60   0-2  :213   no  :222   1: 71    
#>  30-39:36   lt40   :  7   25-29  :54   3-5  : 36   yes : 56   2:130    
#>  40-49:90   premeno:150   20-24  :50   6-8  : 17   NA's:  8   3: 85    
#>  50-59:96                 15-19  :30   9-11 : 10                       
#>  60-69:57                 10-14  :28   12-14:  3                       
#>  70-79: 6                 40-44  :22   15-17:  6                       
#>                           (Other):42   18+  :  1                       
#>    breast       breast.quad  irradiate recurrence
#>  left :152   central  : 21   no :218   no :201   
#>  right:134   left-low :110   yes: 68   yes: 85   
#>              left-up  : 97                       
#>              right-low: 24                       
#>              right-up : 33                       
#>              NA's     :  1                       
#>