Effective network management has been widely recognized as a grand challenge. Enterprise networks spend an increasing amount of their IT budgets (up to 70% in 2008) in order to only maintain their current operating environments, rather than to add new, value-adding services and equipment. A $1000 server that demands annual maintenance cost of $2000 illustrates a simple example of the management problem. This excessive cost of network management is mainly due to the large scale of networks, the heterogeneity of different technologies, and their subtle interactions. The goal of NeTS is to develop a novel network operation and management framework that departs from conventional approaches through a cross-disciplinary research collaboration based on hierarchical network abstraction modeling, structure learning of probabilistic graphical models for machine learning, and wavelet and kernel-based signal processing technologies.
In multiple phases, NeTS will be progressively deployed in the Portugal Telecom (PT)’s operation and management of high definition IP/TV network to illustrate the impact of our proposed research. High Definition IP/TV network is one of the highest growth services for PT and is regarded as a key strategic area in PT. The grand challenge of this complex network is to deliver, without failures, a high quality service efficiently – this requires novel management/operations systems to detect anomalies, identify their root-causes, and predict future network behavior in real time. Existing network management systems for IP/TV are generally focused on monitoring and on simple correlation of events analysis; using these management systems still incurs in excessive costs and delays. NeTS will be deployed online in part of PT’s IP/TV network in order to assess its performance and expected network operations and management cost reduction. Network operators will play a key role in assessment and fine-tuning of our systems.
Our approach to this research problem is significantly different from conventional approaches that focus on monitoring and correlation analysis. We bring together insights from network abstraction and statistical and heuristic machine learning methods and closely tie them to build an intelligent network abstraction model. Network insights show that the network configuration dictates the behavior of the network control plane and the network control plane dictates the behavior of the data plane. We bring configuration, control and data plane abstraction models together with probabilistic graphical models to derive an intelligent and predictable network abstraction model of an enterprise network. Further novelty of our approach consists of performing automatic learning of the graph structure using concepts from Bayesian model selection, information theory, and optimization; this will be done step-by-step using first static network configuration elements, then by summarization and filtering of dynamic control plane information, and finally of data plane information. Once a graphical model is obtained for the network, we will perform inference on the graph to identify anomalies and their root-causes and to predict future network behavior. Finally, we will explore the concept of forcing function to impose restrictions on the network configuration that simplify the graphical model and the inference process. This project heavily leverages on previous network abstraction modeling work with PT in last 3 years.
The direct impact of this research will be on the network operations and management of PT’s IP/TV service. Although at this point it is not possible to say what the real impact will be, we anticipate a reduction of at least 30% on network operations and management costs. This is supported by the anticipation that support teams will be directed to a single extremely likely root cause of a failure rather than to multiple possible failure points, that root-causes will be detected in the time-span of seconds rather than days, that service may be brought up in minutes s rather than days, and that network behavior and failures may be anticipated, thus reducing down time several orders of magnitude. We expect other enterprise network-supported services may also benefit from NeTS because the network modeling and management functions researched in this project are generic to enterprise networks.
To achieve NeTS’ goals, we formed an interdisciplinary team that includes experts in Networking (Dr. Sobrinho, IT/Lisboa, Dr. Kim, ECE/CMU and Dr. Alegria, PT), Machine Learning (Dr. Cardoso, INESC/Porto and Dr. Morla, INESC/Porto), Information Theory, Communications and Signal Processing (Dr. Rodrigues, IT/Porto), as well as Optimization (Dr. Xavier, ISR/Lisboa). Furthermore, we obtained a strong collaboration commitment from Portugal Telecom with their operational experiences and manpower that will be explored during all phases of the research project (see attached letter of support from PT).
We plan to integrate several PhD students who are already enrolled in (e.g. Tiago Carvalho and Tiago Quelhas) and in the process of applying to the CMU-Portugal program. It is also important to note that we have created various PhD-level courses within the CMU-Portugal program in topics directly relevant to this research proposal, as well as taken the responsibility for advanced courses within the CMU-Portugal Master’s program in Information Networking. Class projects for graduate and undergraduate students will also focus on research issues that contribute to the success of NeTS. Furthermore, PT employees who are enrolled in the Master’s program will work on NeTS projects and continue their work when they return to PT after completion of their degrees.
CMU and Portuguese PIs will ensure that project development, milestone achievement, and partner coordination are on track with plan and take measures to correct any deviation to the project timetable. This will be facilitated by the existence of previous work between partners e.g. PT and CMU and shared PhD students.
|Start Date: 01-09-2010|
|End Date: 01-09-2013|
|Team: Miguel Raul Dias Rodrigues|
|Groups: Information Theory – Po|
|Local Coordinator: Miguel Raul Dias Rodrigues|
|Links: Internal Page|