Creating and sharing knowledge for telecommunications

The Area under the ROC Curve as a Criterion for Clustering Evaluation

Aidos, H. ; Duin, R. P. W. ; Fred, A. L. N.

The Area under the ROC Curve as a Criterion for Clustering Evaluation, Proc International Conf. on Pattern Recognition Applications and Methods - ICPRAM, Barcelona, Spain, Vol. 0, pp. 276 - 280, February, 2013.

Digital Object Identifier:

In the literature, there are several criteria for validation of a clustering partition. Those criteria can be external or internal, depending on whether we use prior information about the true class labels or only the data itself. All these criteria assume a fixed number of clusters k and measure the performance of a clustering algorithm for that k. Instead, we propose a measure that provides the robustness of an algorithm for several values of k, which constructs a ROC curve and measures the area under that curve. We present ROC curves of a few clustering algorithms for several synthetic and real-world datasets and show which clustering algorithms are less sensitive to the choice of the number of clusters, k. We also show that this measure can be used as a validation criterion in a semi-supervised context, and empirical evidence shows that we do not need always all the objects labeled to validate the clustering partition.