Creating and sharing knowledge for telecommunications

Data Clustering Validation using Constraints

Duarte, J. ; Fred, A. L. N. ; Duarte, F. Jorge Duarte

Data Clustering Validation using Constraints, Proc International Conf. on Knowledge Discovery and Information Retrieval - KDIR, Vilamoura, Algarve, Portugal, Portugal, Vol. 1, pp. 17 - 27, September, 2013.

Digital Object Identifier: 10.5220/0004543800170027

Download Full text PDF ( 560 KBs)

Much attention is being given to the incorporation of constraints into data clustering, mainly expressed in
the form of must-link and cannot-link constraints between pairs of domain objects. However, its inclusion
in the important clustering validation process was so far disregarded. In this work, we integrate the use
of constraints in clustering validation. We propose three approaches to accomplish it: produce a weighted
validity score considering a traditional validity index and the constraint satisfaction ratio; learn a new distance
function or feature space representation which better suits the constraints, and use it with a validation index;
and a combination of the previous. Experimental results in 14 synthetic and real data sets have shown that
including the information provided by the constraints increases the performance of the clustering validation
process in selecting the best number of clusters.