Creating and sharing knowledge for telecommunications

Anonymized Data Assessment via Analysis of Variance: an Application to Higher Education Evaluation

Ferrão, M. E. Ferrão ; Sousa, Paula Prata ; Fazendeiro, P.

Anonymized Data Assessment via Analysis of Variance: an Application to Higher Education Evaluation, Proc International Conference on Computational Science and Its Applications ICCSA, Athens, Greece, Vol. , pp. - , July, 2023.

Digital Object Identifier:

Download Full text PDF ( 672 KBs)

 

Abstract
The assessment of the utility of an anonymized data set can be operationalized by the determination of the amount of information loss. To investigate the possible degradation of the relationship between variables after anonymization, hence measuring the loss, we perform an a posteriori analysis of variance. Several anonymized scenarios are compared with the original data. Differential privacy is applied as data anonymization process. We assess data utility based on the agreement between the original data structure and the anonymized structures. Data quality and utility are quantified by standard metrics, characteristics of the groups obtained. In addition, we use analysis of variance to show how estimates change. For illustration, we apply this approach to Brazilian Higher Education data with focus on the main effects of interaction terms involving gender differentiation. The findings indicate that blindly using anonymized data for scientific purposes could potentially undermine the validity of the conclusions.