Data Science Challenge on Data Streams

A Track in the IEEE Big Data 2019 Big Data Cup


  • Detailed tutorial with code example is available. You can download here Starter pack.

Submission protocol

The competition is run continuously for the specified duration where the test and training data are issued in the stream with regards to the batch size as well as the time intervals defined in the competition settings. The flow is made available to all the participants at the same time and can be consumed independently. Since this is the competition on data streams, a critical remark for all participants is to strictly respect the schedule and the deadlines.

  • Registration/Subscription: all participants are asked to register to the platform by providing an email and a password. Thereafter, users can subscribe to the competition and every user is provided a secret key to be used subsequently to secure bi-directional stream communication with the platform. Data release starts and ends according to the specified schedule of the competition. Data are issued using a predefined format detailed in a Protobuf file that users must use to format predictions for stream submission.
  • Submission: before starting, users must check that they have gRPC installed in their environment. Test and learning instances are released in batches every time interval. When a test batch is released, participants must submit their predictions before the specified due date, otherwise, they will be penalized with a worst-case value for used evaluation metric.
  • Evaluation and Results: submitted predictions are evaluated online and results are updated live. For evaluation of the results we will use one of the common metrics used for forecasting time series: Mean Absolute Percentage Error (MAPE), Mean Squared Error (MSE) or Mean Absolute Error (MAE). The evaluation metric will be defined in the competition description. Intermediate results and rankings can be viewed in the web application providing a user friendly leader board.

  • Rules of the competition:

    • Employees of the company that will provide the data can participate in the competition but cannot compete for prizes.
    • Participants must submit the predictions in the time interval specified in the competition for each batch of instances. If the prediction is submitted late, the default value defined in the competition description will be taken into account for evaluation.
    • Participants are allowed to use any additional libraries and resources for this competition, that includes: hardware setup, software setup, programming language, additional data. The only limitation for participants is that the programming language must be supported by protobuf and gRPC.
    • The predictions must be submitted in the appropriate format, defined using protobuf, which will be available in the competition description. If the instructions are not followed, the participants will be penalized in the same way as for late predictions.
    • After the competition is over, the top 3 participants must submit their code to validate the results. Code submission will be explained in the competition


    The prizes will be provided for the top three participants. To be eligible to claim the prize, every of the top three participants must provide the code, in order for results to be confirmed. The prizes will be as follows:

    • First place: $ 1,000
    • Second place: $ 500
    • Third place: $ 250