Creating and sharing knowledge for telecommunications

Statistical Modeling of Dissimilarity Increments for d-dimensional Data: Application in Partitional Clustering

Aidos, H. ; Fred, A. L. N.

Pattern Recognition Vol. 45, Nº 9, pp. 3061 - 3071, September, 2012.

ISSN (print): 0031-3203
ISSN (online):

Journal Impact Factor: 3,279 (in 2008)

Digital Object Identifier: 10.1016/j.patcog.2011.12.009

Abstract
This paper addresses the use of high order dissimilarity models in data mining problems. We explore dissimilarities between triplets of nearest neighbors, called dissimilarity increments (DIs). We derive a statistical model of DIs for d-dimensional data (d-DID) assuming that the objects follow a multivariate Gaussian distribution. Empirical evidence shows that the d-DID is well approximated by the particular case d=2. We propose the application of this model in clustering, with a partitional algorithm that uses a merge strategy on Gaussian components. Experimental results, in synthetic and real datasets, show that clustering algorithms using DID usually outperform well known clustering algorithms.