Extracting Information from Interval Data Using Symbolic Principal Component Analysis
de Oliveira, M.R.O
; Vilela, M.
; Pacheco, A.
;
Valadas, R.
;
Salvador, P.
Austrian Journal of Statistics Vol. 46, Nº 3-4, pp. 79 - 87, April, 2017.
ISSN (print): 1026-597X
ISSN (online):
Scimago Journal Ranking: 0,23 (in 2017)
Digital Object Identifier: 10.17713/ajs.v46i3-4.673
Abstract
We introduce generic definitions of symbolic variance and covariance for random interval-valued variables, that lead to a unified and insightful interpretation of four known symbolic principal component estimation methods: CPCA, VPCA, CIPCA, and SymCovPCA. Moreover, we propose the use of truncated versions of symbolic principal components, that use a strict subset of the original symbolic variables, as a way to improve the interpretation of symbolic principal components. Furthermore, the analysis of a real dataset leads to a meaningful characterization of Internet traffic applications, while highlighting similarities between the symbolic principal component estimation methods considered in the paper.