Junior Conference on Data Science and Engineering
Paris-Saclay, 14th-15th September 2017


The second edition of the Paris-Saclay Junior Conference on Data Science and Engineering is addressed to PhD students in their first year, M2 students and third year students at Engineering schools at Paris-Saclay. It will offer these students the opportunity to present their scientific works developed at internships, or in the first year of thesis, and also to grow their critical sense thanks to a professional conference hosting prestigious invited speakers, academics and industry scientists.

The conference aims at gathering a large public of master, engineering school and PhD students, and is an excellent means of discovering the research world in Data Science and Engineering.

This year PhD students are also involved in the conference organization as reviewers, session chairs, organizers of networking events.
Please contact us if you are interested in joining the team.

Previous year program

Follow us on twitter! #JDSE2017

Photo credits: I. Manolescu, M2 BIBS students, L. Pauleve


Katharina Morik
Professor at the TU Dortmund University, Germany

Machine Learning under Resource Constraints

Abstract: Big data are produced by various sources, often distributed over several measuring entities. Where the sensors have restricted capabilities, the compute clusters, where the data are stored is usually very fast. In both cases, energy consumption needs to be restricted. In this talk, the interplay of data analysis at the sensors and in the cloud, together with the application of the analysis is explained. We discuss opportunities for using sophisticated models for learning spatio-temporal models. In particular, we investigate graphical models, which generate the probabilities for connected (sensor) nodes. We even approximate likelihood estimates such that they can be computed on very restricted devices.

Ulf Leser
Professor at the Institute for Computer Science, Humboldt-Universität zu Berlin, Germany

Web-Scale Domain-Specific Information Extraction

Abstract: Information Extraction (IE) from unstructured texts is a technology with growing importance in many applications. Three important challenges to IE are the achievement of high quality results, scalability of methods to very large corpora, and integration of IE results with other data for downstream analysis. In this talk, we will highlight recent advances and open questions in these areas by drawing from extensive experiences in developing and applying IE for biomedical research.

Olivier Bousquet
Head of Machine Learning Research, Google, Zürich, Switzerland

Building Intelligent Systems at Scale with Deep Learning

Abstract: Recent advances in Deep Learning are at the heart of the current AI boom. We will review some of these advances in areas such as image understanding, machine translation or robotics, and analyze the factors that can explain the observed progress, from the availability of large amounts of labeled data to new computing paradigms as well as algorithmic and scientific insights. Looking forward, we will discuss the remaining scientific and technical challenges and highlight some of the promising research directions in the field.

Preliminary Program

This is a preliminary program. The schedule is subject to small changes.
The talks will be in English.

Thursday 14 September 2017

9:40 - 10:00: Coffee

10:00 - 11:45: Morning Session I

Databases and ontologies
10:00 - 10:15: Welcome message
10:15 - 11:15: Keynote Ulf Leser (Humboldt University) Web-Scale Domain-Specific Information Extraction
11:15 - 11:30: On the Automatic Distribution and Parallelization of a miRNA Prediction Algorithm using Spark and Mesos frameworks. Alexandre Protat, Laurent Poligny, Nazim Agoulmine and Fariza Tahi. IBISC, U Evry
11:30 - 11:35: Repairing regular expressions by adding missing words. Thomas Rebele, Katerina Tzompanaki and Fabian M. Suchanek. Telecom ParisTech, U Cergy-Pontoise
11:35 - 11:40: Grouping Answers in Ontology-Based Query Answering: the RDFS case. Maxime Buron and Michaël Thomazo. LIX, Inria
11:40 - 11:45: Forecasting bike sharing demand. Aurélie Fréchet. EDF R&D

11:45 - 12:15: Coffee break

12:15 – 13:15: Morning Session II

Community, graph
12:15 - 12:30: Community recovery in stochastic block models using semi-definite programming. Youssouf Emin and Ismael Lemhadri. Ecole polytechnique
12:30 - 12:45: Phase transitions in hierarchical community detection. Stefano Sarao, Thibault Lesieur and Lenka Zdeborová. IPhT, CEA
12:45 - 13:00: End-to-end Causal Modelling. Diviyan Kalainathan, Olivier Goudet, Philippe Caillou, Isabelle Guyon, Michèle Sébag and Paola Tubaro. LRI, Inria, UPSud
13:00 - 13:15: Weak laws of large numbers for persistence diagrams. Vincent Divol. ENS, UPSud

13:30 - 15:00: Lunch

15:00 – 16:30: Afternoon Session I

Spatio-temporal data
15:00 - 16:00: Keynote Katharina Morik (TU Dortmund University) Machine Learning under Resource Constraints
16:00 - 16:15: Bandit algorithms for power consumption control. Margaux Brégère. EDF R&D
16:15 - 16:20: On Modeling and Analyzing Museum Visitor Movements with Semantic Trajectories. Alexandros Kontarinis. ETIS, U Cergy-Pontoise, DAVID, UVSQ
16:20 - 16:25: Towards a Scalable learning and Classification of Multivariate Time Series. Ali Mzahem, Yehia Taher and Karine Zeitouni. DAVID, UVSQ

16:30 – 17:00: Afternoon Session II

16:30 - 16:45: From safe screening rules to working sets for faster Lasso-type solvers. Mathurin Massias, Alexandre Gramfort and Joseph Salmon. Inria Saclay, LTCI, Telecom ParisTech
16:45 - 17:00: Bi-Objective Integer Programming For RNA Secondary Structure Prediction With Pseudoknots. Audrey Legendre, Eric Angel and Fariza Tahi. IBISC, U Evry

17:00 – 18:30: Poster Session and Cocktail

Friday 15 September 2017

9:15 - 9:30: Coffee

9:30 - 11:00: Morning Session I

Machine learning and deep learning
9:30 - 10:30: Keynote Olivier Bousquet (Google Zurich) Building Intelligent Systems at Scale with Deep Learning.
10:30 - 10:45: Robust deep learning: A case study. Victor Estrade, Cécile Germain, Isabelle Guyon and David Rousseau. LRI, Inria, LAL, UPSud
10:45 - 11:00: SPI-DNA: End-to-end Deep Learning Approach for Demographic History Inference. Théophile Sanchez , Guillaume Charpiat and Flora Jay. LRI, Inria, CNRS, UPSud

11:00 - 12:00: Poster session II and coffee break

12:00 – 13:20: Morning Session II

Machine learning and statistical theory
12:00 - 12:15: Supervised layer Self-Organizing Maps with reject options. Ludovic Platon, Farida Zehraoui and Fariza Tahi. IBISC, IPS2, U Evry
12:15 - 12:30: On the benefits of output sparsity for multi-label classification. Evgenii Chzhen, Christophe Denis, Mohamed Hebiri and Joseph Salmon. Telecom ParisTech, UPEM
12:30 - 12:45: Scalable Model-based Cascaded Imputation of Missing Data. Jacob Montiel, Jesse Read, Albert Bifet and Talel Abdessalem.
12:45 - 13:00: To be Confirmed

13:00 - 13:20: CDS/RAMP, Balázs Kégl. LAL, CNRS, UPSud

13:20 - 13:30: Talk & Poster Awards

13:30: Lunchbox to go

Afternoon Student Activity

Game and Microsoft Center Tour!
Please fill up the short following form:

Call for submissions

2nd call for posters:

A second call for posters is now open (deadline July 23rd, midnight).
Please follow the same guidelines and submisison system as described below.

We will award a prize to the best communication.

We invite Master M2 and PhD students from Université Paris-Saclay to submit an extended abstract of up to 3 pages describing new or preliminary results of their work in one of the three forms: oral talk, poster or poster-demo (all in English). Master students are encouraged to submit posters even if they do not have substantial results at the time of submission. Submissions should be formatted according to the Springer Lecture Notes in Computer Science style.

Extended Abstract Instructions:

  • Please follow this simplified template: PDF ; latex source
  • Language = English
  • Number of pages = 1 to 3
  • Maximum number of Table or Figure = 1
  • Mandatory paragraphs = Abstract, Keywords, Motivation, References (max 10 ref.)

Each extended abstract must be submitted online via the Easychair submission system (jdseparis17) :


The topics of the conference are listed below:

  • data mining
  • databases
  • big data analytics
  • machine learning
  • statistics
  • semantic web
  • scientific workflows
  • distributed data and computing
  • applications of data science (biomedical and biological data, physics, chemistry, smart cities, image, documents, audio, video, on-line advertisement, ...)

The deadline for submissions for oral and poster presentations was extended see DATES for details.

The extended abstracts will be reviewed by the scientific program committee, possibly including one junior PC member (PhD student or postdoc in data science). They will be selected for oral (15 min) or poster presentation (flash talks, poster and poster-demo sessions) according to their originality and relevance to the conference topics. All the presentations should be in English. Electronic versions of the extended abstracts will be accessible on the conference web site. The book of abstracts will not be published and the extended abstracts will not constitute a formal publication.

Note: Only Master M2 and PhD students from Université Paris-Saclay are invited to contribute.

Important Dates

Contributions submission deadline: May 29th 2017 June 2nd 2017, midnight

Acceptance notification: July 3rd, 2017

Second call (posters only) - deadline: July 23rd 2017, midnight

Registration is open

Close registration: July 26th, 2017 September 4th, 2017
We know you might be in holidays right now, but the sooner you register the easier it is for us, so please take the time to fill the form!

Conference: September 14-15th, 2017


This second edition of the Paris-Saclay Junior Conference on Data Science and Engineering is addressed only to students and researchers at University Paris-Saclay. Registration is free but it is mandatory.

Please fill out the conference registration form and the Student Afternoon Activity Quizz


Sarah Cohen-Boulakia
Chair, Université Paris-Sud, UPsay

Flora Jay
Co chair, Université Paris-Sud, UPsay

Florence d'Alché-Buc
Télécom ParisTech, UPsay

Sylvain Arlot
Université Paris-Sud, UPSay

Albert Bifet
Télécom ParisTech, UPsay

Isabelle Huteau

Joseph Salmon
Télécom-ParisTech, UPsay

Karine Zeitouni
Université de Versailles-Saint-Quentin-en-Yvelines, UPsay

Samuel Delcourt
M2 DataScience, President of the Student Committee

Hamid Jalalzai
M2 DataScience, Vice-President of the Student Committee

Pierre Andrieu
M2 AMI2B, Member of the Student Committee

Thomas Denecker
M2 AMI2B, Member of the Student Committee

Pauline Fourgoux
M2 AMI2B, Member of the Student Committee

Aris Tritas
M2 AIC, Member of the Student Committee

Program Committee

Florence d'Alché-Buc (Télécom ParisTech, LTCI)

Alexandre Allauzen (Université Paris-Sud, LIMSI)

Sylvain Arlot (Université Paris-Sud, LMO)

Nacera Bennacer (CentraleSupelec, E3S)

Albert Bifet (Télécom ParisTech, LTCI)

Christophe Blanchet (Institut Français de Bioinformatique)

Stephan Clémençon (Télécom ParisTech, LTCI)

Sarah Cohen-Boulakia (Université Paris-Sud, LRI)

Arnak Dalalyan (ENSAE ParisTech, CREST)

Juliette Dibie-Barthélemy (AgroParisTech/INRA, MMIP)

Cyril Furtlehner (Inria, Université Paris-Sud, LRI)

Christophe Giraud (Université Paris-Sud/Ecole Polytechnique, CPAM)

Alexandre Gramfort (Télécom ParisTech, LTCI)

Blaise Hanczar (Université d'Evry, IBISC)

Flora Jay (Université Paris-Sud/CNRS, LRI)

Balazs Kagl (Université Paris-Sud/CNRS, LAL)

Christine Keribin (Université Paris-Sud, LMO)

Erwan Le Pennec (Ecole Polytechnique, LIX)

Silviu Maniu (Université Paris-Sud, LRI)

Eric Moulines (Ecole Polytechnique, CMAP)

Nathalie Pernelle (Université Paris-Sud, LRI)

Emmanuel Pietriga (Inria, LRI)

Philippe Pucheral (Université Versailles Saint Quentin/Inria, DAVID)

Joseph Salmon (Télécom ParisTech, LTCI)

Marie Szafranski (ENSIEE, LaMME)

Fabian Suchanek (Télécom ParisTech, LTCI)

Xavier Tannier (Université Paris-Sud, LIMSI)

Arthur Tenenhaus (CentraleSupelec, L2S)

Karine Zeitouni (Université Versailles Saint Quentin, DAVID)

Pierre Zweigenbaum (Université Paris-Sud, LIMSI)

Berna Bakir Batu (Inria, LRI)

Takiy Berrandou (Inserm/UPS/UVSQ, CESP)

Jieying Chen (Université Paris-Sud, LRI)

Antonin Della Noce (CentraleSupelec, MICS)

Tom Dupre La Tour (Télécom ParisTech, LTCI)

François Gonard (Inria, LRI & IRT SystemX)

Alexandros Kontarinis (U Cergy Pontoise, ETIS & U Versailles Saint Quentin, DAVID)

Anna Korba (Télécom ParisTech, LTCI)

Mathurin Massias (Télécom ParisTech, LTCI)

Eugene Ndiaye (Télécom ParisTech, LTCI)

Joe Raad (MIA-Paris, AgroParisTech/INRA, MIA-PAris)

Thomas Rebele (Télécom ParisTech, LTCI)

Hoang Van Ha (Université Paris-Sud, LMO)

Partners and sponsors

This work was partially supported by a public grant as part of the Investissement d'avenir project (ANR-11-LABX-0056-LMH), and by Labex DigiCosme (ANR-11-LABEX-0045-DIGICOSME), operated by ANR as part of the program Investissement d'Avenir Idex Paris Saclay (ANR-11-IDEX-0003-02).


Laboratoire de l'Accélérateur Linéaire
Centre Scientifique d'Orsay
Bâtiment 200 - BP 34
91898 ORSAY Cedex