Patterns classes are heart disease present and heart disease absent
262 Performance
The area under the curve is given by
Z | .1 � F.u//dG.u/ D 1 � | Z | (8.10) |
---|
value than a randomly chosen class !1 pattern is
definition (8.11) for the area under the ROC curve. R G.u/ f .u/ du. This is the same as the
Calculating the area under the ROC curve
The area under the ROC curve is easily calculated by applying the classification rule
X .ri � i/ D X ri �X i D S0 �1 2n1.n1 C 1/ iD1 iD1 iD1
where S0 is the sum of the ranks of the class !1 test patterns. Since there are n1n2
OA D | 1 | ² | |
---|---|---|---|
n1n2 |
been obtained using the rankings alone and has not used threshold values to calculate it.
The standard deviation of the statisticOA is (Hand and Till, 2001)
s |
---|
S0
O� D n1n2
Q0 D1 6.2n1 C 2n2 C 1/.n1 C n2/ � Q1
n1
Q1 D X
.r j � 1/2 jD1
An alternative approach, considered by Bradley (1997), is to construct an estimate of the ROC curve directly for specific classifiers by varying a threshold and then to use an integration rule (for example, the trapezium rule) to obtain an estimate of the area beneath the curve.
The data There are six data sets comprising measurements on two classes:
1. Cervical cancer. Six features, 117 patterns; classes are normal and abnormal cervical cell nuclei.
6. Heart disease 2. Eleven features, 261 patterns; classes are heart disease present and heart disease absent.
Incomplete patterns (patterns for which measurements on some features are missing) were removed from the data sets.