Data Science Challenge on Data Streams

A Track in the IEEE Big Data 2019 Big Data Cup


  • This is Machine Learning competition on data streams which implies that data will be released once competition starts in the form of stream.
  • Telecommunication companies and network providers are very concerned about the behavior of their network to ensure high-quality services. They collect data in real-time about network traffic and need to extract valuable knowledge so they can predict future behavior for capacity planning or anomaly detection in case of malicious attacks or malfunctioning devices. It is of great importance to process the data online and make fast decisions when required. Our industrial partners will provide us with data that was collected continuously from their networks. The collected data consists of many recorded parameters related to devices location, quality of the signal, loss in packet transmission and many more. The parameters can be categorical, continuous or discrete. Our competition proposal addresses network activity analysis scenario and may appear in various predicting use cases:
    • Capacity planning and activity prediction: Predicting metrics about the network such as the number of devices and number of messages that passed through the network will allow companies to predict future behavior and deploy necessary resources when needed such as network expansion with new devices.
    • Anomaly detection: Evolution of certain metrics may reflect malfunctioning devices among the network such as signal strength, noise ratio, and packet loss. The goal is to predict the future value of these metrics and spot abnormal values.
  • The detailed description of the dataset will be released soon.