Statistical Modeling of Dissimilarity Increments for d-dimensional Data: Application in Partitional Clustering
Fred, A. L. N.
Pattern Recognition Vol. 45, Nº 9, pp. 3061 - 3071, September, 2012.
ISSN (print): 0031-3203
Journal Impact Factor: 3,279 (in 2008)
Digital Object Identifier: 10.1016/j.patcog.2011.12.009
This paper addresses the use of high order dissimilarity models in data mining problems. We explore dissimilarities between triplets of nearest neighbors, called dissimilarity increments (DIs). We derive a statistical model of DIs for d-dimensional data (d-DID) assuming that the objects follow a multivariate Gaussian distribution. Empirical evidence shows that the d-DID is well approximated by the particular case d=2. We propose the application of this model in clustering, with a partitional algorithm that uses a merge strategy on Gaussian components. Experimental results, in synthetic and real datasets, show that clustering algorithms using DID usually outperform well known clustering algorithms.