A Complete Review on the Application of Statistical Methods for Evaluating Internet Traffic Usage
; Zavala, A.
; Magoni, D.
Inácio , P. R. M. I.
IEEE Access Vol. 10, Nº 1, pp. 128433 - 128455, December, 2022.
ISSN (print): 2169-3536
Scimago Journal Ranking: 0,93 (in 2022)
Digital Object Identifier: 10.1109/ACCESS.2022.3227073
Internet traffic classification aims to identify the kind of Internet traffic. With the rise of traffic
encryption and multi-layer data encapsulation, some classic classification methods have lost their strength.
In an attempt to increase classification performance, Machine Learning (ML) strategies have gained the
scientific community interest and have shown themselves promising in the future of traffic classification,
mainly in the recognition of encrypted traffic. However, some of these methods have a high computational
resource consumption, which make them unfeasible for classification of large traffic flows or in real-time.
Methods using statistical analysis have been used to classify real-time traffic or large traffic flows, where the
main objective is to find statistical differences among flows or find a pattern in traffic characteristics through
statistical properties that allow traffic classification. The purpose of this work is to address statistical methods
to classify Internet traffic that were little or unexplored in the literature. This work is not generally focused
on discussing statistical methodology. It focuses on discussing statistical tools applied to Internet traffic
classification Thus, we provide an overview on statistical distances and divergences previously used or with
potential to be used in the classification of Internet traffic. Then, we review previous works about Internet
traffic classification using statistical methods, namely Euclidean, Bhattacharyya, and Hellinger distances,
Jensen-Shannon and Kullback–Leibler (KL) divergences, Support Vector Machines (SVM), Correlation
Information (Pearson Correlation), Kolmogorov-Smirnov and Chi-Square tests, and Entropy. We also discuss
some open issues and future research directions on Internet traffic classification using statistical methods.