Junior Conference on Data Science and Engineering
Paris-Saclay, 14th-15th September 2017


The second edition of the Paris-Saclay Junior Conference on Data Science and Engineering is addressed to PhD students in their first year, M2 students and third year students at Engineering schools at Paris-Saclay. It will offer these students the opportunity to present their scientific works developed at internships, or in the first year of thesis, and also to grow their critical sense thanks to a professional conference hosting prestigious invited speakers, academics and industry scientists.

The conference aims at gathering a large public of master, engineering school and PhD students, and is an excellent means of discovering the research world in Data Science and Engineering.

This year PhD students are also involved in the conference organization as reviewers, session chairs, organizers of networking events.
Please contact us if you are interested in joining the team.

Previous year program

Follow us on twitter! #JDSE2017

Photo credits: I. Manolescu, M2 BIBS students, L. Pauleve


Congratulations to Ismael Lemhadri (CMAP, Ecole polytechnique, “Community recovery in stochastic block models using semi-definite programming”, work in close collaboration with Youssouf Emin, Ecole Polytechnique) and Mathurin Massias (Inria Saclay, LTCI, Télécom ParisTech, “From safe screening rules to working sets for faster Lasso-type solvers”) who won the best talk award (ex-aequo) ; to Rafael Pinot (Institut LIST, CEA Paris-Saclay, “Nodes clustering in a graph under differential privacy constraints”) who won the best poster award ; to Margaux Brégère (EDF R&D, "Bandit algorithms for power consumption control") for her dynamic talk and promising work.

An article about the JDSE2017.

Watch the JDSE2017 conference replay.

The booklet with all extended abstracts can be found here.

Follow us on twitter! You will find pictures of JDSE2017 and information about JDSE2018 before anyone else!

Contact us as early as you want if you would like to be part of the junior or senior program or local committees of JDSE2018
i.e. to participate to one or several of the following tasks: planning the event beforehand, spreading the word, organizing the student friday activity, reviewing scientific papers, chairing sessions, helping with logistics during the conference, and so on.


Katharina Morik
Professor at the TU Dortmund University, Germany

Machine Learning under Resource Constraints

Abstract: Big data are produced by various sources, often distributed over several measuring entities. Where the sensors have restricted capabilities, the compute clusters, where the data are stored is usually very fast. In both cases, energy consumption needs to be restricted. In this talk, the interplay of data analysis at the sensors and in the cloud, together with the application of the analysis is explained. We discuss opportunities for using sophisticated models for learning spatio-temporal models. In particular, we investigate graphical models, which generate the probabilities for connected (sensor) nodes. We even approximate likelihood estimates such that they can be computed on very restricted devices.

Ulf Leser
Professor at the Institute for Computer Science, Humboldt-Universität zu Berlin, Germany

Web-Scale Domain-Specific Information Extraction

Abstract: Information Extraction (IE) from unstructured texts is a technology with growing importance in many applications. Three important challenges to IE are the achievement of high quality results, scalability of methods to very large corpora, and integration of IE results with other data for downstream analysis. In this talk, we will highlight recent advances and open questions in these areas by drawing from extensive experiences in developing and applying IE for biomedical research.

Olivier Bousquet
Head of Machine Learning Research, Google, Zürich, Switzerland

Building Intelligent Systems at Scale with Deep Learning

Abstract: Recent advances in Deep Learning are at the heart of the current AI boom. We will review some of these advances in areas such as image understanding, machine translation or robotics, and analyze the factors that can explain the observed progress, from the availability of large amounts of labeled data to new computing paradigms as well as algorithmic and scientific insights. Looking forward, we will discuss the remaining scientific and technical challenges and highlight some of the promising research directions in the field.


Information to the authors
Speakers, please come to the registration desk with your slides as a pdf file on a USB stick as soon as possible and no later than during the break before your session.
All posters will be presented during both sessions (Thursday and Friday).

The talks will be in English.
Watch the JDSE2017 conference live stream or afterwards

The booklet with all extended abstracts can be found here.

Thursday 14 September 2017

9:15- 9:45: Coffee and conference registration

9:50 - 10:15: Welcome message. S. Cohen-Boulakia and F. Jay, PC chairs ; F. d'Alché-Buc, Digicosme

10:15 - 11:45: Morning Session I

Databases and ontologies
10:15 - 11:15: Keynote Ulf Leser (Humboldt University) Web-Scale Domain-Specific Information Extraction
Chair: Ludovic Platon (IBISC)
11:15 - 11:30: On the Automatic Distribution and Parallelization of a miRNA Prediction Algorithm using Spark and Mesos frameworks. Alexandre Protat, Laurent Poligny, Nazim Agoulmine and Fariza Tahi. IBISC, U Evry
11:30 - 11:35: Repairing regular expressions by adding missing words. Thomas Rebele, Katerina Tzompanaki and Fabian M. Suchanek. Telecom ParisTech, U Cergy-Pontoise
11:35 - 11:40: Grouping Answers in Ontology-Based Query Answering: the RDFS case. Maxime Buron and Michaël Thomazo. LIX, Inria
11:40 - 11:45: Forecasting bike sharing demand. Aurélie Fréchet. EDF R&D

11:45 - 12:15: Coffee break

12:15 – 13:15: Morning Session II

Community, graph
Chair: Van Ha Hoang (UPSud)
12:15 - 12:30: Community recovery in stochastic block models using semi-definite programming. Youssouf Emin and Ismael Lemhadri. Ecole polytechnique
12:30 - 12:45: Phase transitions in hierarchical community detection. Stefano Sarao, Thibault Lesieur and Lenka Zdeborová. IPhT, CEA
12:45 - 13:00: End-to-end Causal Modelling. Diviyan Kalainathan, Olivier Goudet, Philippe Caillou, Isabelle Guyon, Michèle Sébag and Paola Tubaro. LRI, Inria, UPSud
13:00 - 13:15: Weak laws of large numbers for persistence diagrams. Vincent Divol. ENS, UPSud

13:30 - 15:00: Lunch at CESFO Bures-sur-Yvette (take a left when going out of the LAL, here is the way )

15:00 – 16:30: Afternoon Session I

Spatio-temporal data
15:00 - 16:00: Keynote Katharina Morik (TU Dortmund University) Machine Learning under Resource Constraints
Chair: Anna Korba (Telecom ParisTech)
16:00 - 16:15: Bandit algorithms for power consumption control. Margaux Brégère. EDF R&D
16:15 - 16:20: On Modeling and Analyzing Museum Visitor Movements with Semantic Trajectories. Alexandros Kontarinis. ETIS, U Cergy-Pontoise, DAVID, UVSQ
16:20 - 16:25: Towards a Scalable learning and Classification of Multivariate Time Series. Ali Mzahem, Yehia Taher and Karine Zeitouni. DAVID, UVSQ

16:30 – 17:00: Afternoon Session II

Chair: Antonin Della Noce (CentraleSupélec)
16:30 - 16:45: From safe screening rules to working sets for faster Lasso-type solvers. Mathurin Massias, Alexandre Gramfort and Joseph Salmon. Inria Saclay, LTCI, Telecom ParisTech
16:45 - 17:00: Bi-Objective Integer Programming For RNA Secondary Structure Prediction With Pseudoknots. Audrey Legendre, Eric Angel and Fariza Tahi. IBISC, U Evry

17:00 – 18:30: Poster Session and Cocktail

Demos/posters program
Posters will be presented during both sessions (Thursday and Friday) by all authors.

Friday 15 September 2017

9:15 - 9:30: Coffee

9:30 - 11:00: Morning Session I

Machine learning and deep learning
9:30 - 10:30: Keynote Olivier Bousquet (Google Zurich) Building Intelligent Systems at Scale with Deep Learning.
Chair: Takiy Berrandou (CESP)
10:30 - 10:45: Robust deep learning: A case study. Victor Estrade, Cécile Germain, Isabelle Guyon and David Rousseau. LRI, Inria, LAL, UPSud
10:45 - 11:00: SPI-DNA: End-to-end Deep Learning Approach for Demographic History Inference. Théophile Sanchez , Guillaume Charpiat and Flora Jay. LRI, Inria, CNRS, UPSud

11:00 - 12:00: Poster session II and coffee break

Demos/posters program
Posters will be presented during both sessions (Thursday and Friday) by all authors.

12:00 – 13:20: Morning Session II

Machine learning and statistical theory
Chair: Eugène Ndiaye (Telecom ParisTech)
12:00 - 12:15: Supervised layer Self-Organizing Maps with reject options. Ludovic Platon, Farida Zehraoui and Fariza Tahi. IBISC, IPS2, U Evry
12:15 - 12:30: On the benefits of output sparsity for multi-label classification. Evgenii Chzhen, Christophe Denis, Mohamed Hebiri and Joseph Salmon. UPEM, Telecom ParisTech
12:30 - 12:45: Scalable Model-based Cascaded Imputation of Missing Data. Jacob Montiel, Jesse Read, Albert Bifet and Talel Abdessalem. LTCI, Telecom ParisTech, LIX, Ecole Polytechnique, IPAL, CNRS, U Singapore
12:45 - 13:00: Ranking Data with Continuous Labels through Oriented Recursive Partitions Mastane Achab, Stéphan Clémençon. LTCI, Telecom ParisTech

13:00: RAMP/CDS, Balázs Kégl. LAL, CNRS, UPSud
13:15 A word about the bee-o-diversity challenge by Laurent Cetinsoy and Aris Tritas

13:20: Talk & Poster Awards

13:30: Lunchbox to go

Afternoon Student Activity

Game and Microsoft Center Tour!
Please fill up the short following form. The form is now closed.

Posters will be presented during both sessions (Thursday and Friday) by all authors. Please design your poster in an A0 format (maximum size), portrait orientation.

Grouping Answers in Ontology-Based Query Answering: the RDFS case
Maxime Buron and Michaël Thomazo
INRIA ; LIX, Ecole Polytechnique ; Université Paris-Saclay

Forecasting bike sharing demand
Aurélie Fréchet
Ecole Nationale de la Statistique et de l’Analyse de l’Information, EDF R&D

Repairing regular expressions by adding missing words
Thomas Rebele, Katerina Tzompanaki and Fabian M. Suchanek
Télécom ParisTech ; Université Cergy-Pontoise

Posters will be presented during both sessions (Thursday and Friday) by all authors. Please design your poster in an A0 format (maximum size), portrait orientation.

Beyond stochastic gradient for maximum likelihood based ICA on EEG and MEG
Pierre Ablin, Alexandre Gramfort and Jean-François Cardoso
Telecom Paristech ; Inria, Parietal team ; Institut d’Astrophysique de Paris

Alzheimer’s disease diagnosis using synchrony and disorder measures
Amira Aljane and Nesma Houmani
SAMOVAR, Télécom SudParis, CNRS, Université Paris-Saclay

Deep specification and verification of SQL compilation chain
Léo Andrès, Raphaël Corner and Eunice Martins
LRI, Université de Paris-Sud, Paris-Saclay

Segmentation and Detection in large microscopy for neural development and organization
Tania Bacoyannis, Anatole Chessel, Emmanuel Beaurepaire, Jean Livet and Lamiae Abdeladim
Laboratory of Optics and Biosciences, Ecole Polytechnique, Institut de la Vision

Multi-omics data integration to model iron metabolism in pathogenic yeast species
Thomas Denecker and Gaëlle Lelandais
Institut de Biologie Intégrative de la Cellule, Université de Paris-Sud

Finding Interesting Aggregates in RDF Graphs
Yanlei Diao, Ioana Manolescu and Shu Shang
Ecole Polytechnique, Inria Saclay, and Université Paris-Saclay

Estimation non-paramétrique pour des réseaux aléatoires
Yann Issartel, Christophe Giraud and Nicolas Verzelen
Université Paris-Sud XI, Laboratoire de Mathématiques d'Orsay ; INRA, SupAgro, Montpellier

Modeling Spatio-Temporal Data with Operator-Valued Kernel Methods: an Application to Epidemics
Camille Jandot and Florence d'Alché-buc
LTCI, Télécom ParisTech

Efficient Learning of Functional Outputs using Operator Random Fourier Features
Alex Lambert, Romain Brault and Florence d'Alché-Buc
LTCI, Télécom ParisTech

Breaking boundaries between language and database runtimes
Julien Lopez
LRI, Université Paris-Sud

Smart ease : reducing energy costs by storage and consumption forcasts
Charles Lorenzo and Pierre-Louis Guhur
EEA, ENS Paris-Saclay

Modelizing transformation processes with ontology and probabilistic relational models
Melanie Munch
UMR MIA-Paris, AgroParisTech, INRA, Université Paris-Saclay

Exploring Data Mining Techniques for Opportunistic Mobile Sensing Making Sense of Spatial Multivariate Time Series
Ahmad Mustapha, Yehia Taher and Karine Zeitouni
DAVID, Université Versailles Saint-Quuentin

Big Data Internship: Exploration & analysis of mobile communication's technical data - ByPass Fraud Detection
El Mehdi Oumouss
Télécom Paristech, Université Paris-Saclay

Streaming Comparison Benchmark between Spark and Flink using Kafka
Carlos Perez
Université Paris-Saclay

Nodes clustering in a graph under differential privacy constraints
Rafael Pinot
Institut LIST, CEA, Université Paris-Saclay

Aircraft Dynamics Identification
Cédric Rommel, Frédéric Bonnans, Pierre Martinon and Baptiste Gregorutti
CMAP, Ecole Polytechnique ; INRIA, Safety Line

Automatic Machine Learning: benchmark and future work
Lisheng Sun
LRI, Université Paris-Sud, Paris-Saclay,

Modeling Spatially-Correlated Cellular Networks by Applying Inhomogeneous Poisson Point Processes
Shanshan Wang and Marco Di Renzo
Laboratoire des signaux et systèmes, Centrale Supélec, Université Paris-Saclay, CNRS

Channel modeling analysis of visible light communication by stochatsic geometry
Xiaojun Xi and Marco Di Renzo
Laboratoire des signaux et systèmes, Centrale Supélec, Université Paris-Saclay, CNRS

Call for submissions

2nd call for posters:

A second call for posters is now open (deadline July 23rd, midnight).
Please follow the same guidelines and submisison system as described below.

We will award a prize to the best communication.

We invite Master M2 and PhD students from Université Paris-Saclay to submit an extended abstract of up to 3 pages describing new or preliminary results of their work in one of the three forms: oral talk, poster or poster-demo (all in English). Master students are encouraged to submit posters even if they do not have substantial results at the time of submission. Submissions should be formatted according to the Springer Lecture Notes in Computer Science style.

Extended Abstract Instructions:

  • Please follow this simplified template: PDF ; latex source
  • Language = English
  • Number of pages = 1 to 3
  • Maximum number of Table or Figure = 1
  • Mandatory paragraphs = Abstract, Keywords, Motivation, References (max 10 ref.)

Each extended abstract must be submitted online via the Easychair submission system (jdseparis17) :


The topics of the conference are listed below:

  • data mining
  • databases
  • big data analytics
  • machine learning
  • statistics
  • semantic web
  • scientific workflows
  • distributed data and computing
  • applications of data science (biomedical and biological data, physics, chemistry, smart cities, image, documents, audio, video, on-line advertisement, ...)

The deadline for submissions for oral and poster presentations was extended see DATES for details.

The extended abstracts will be reviewed by the scientific program committee, possibly including one junior PC member (PhD student or postdoc in data science). They will be selected for oral (15 min) or poster presentation (flash talks, poster and poster-demo sessions) according to their originality and relevance to the conference topics. All the presentations should be in English. Electronic versions of the extended abstracts will be accessible on the conference web site. The book of abstracts will not be published and the extended abstracts will not constitute a formal publication.

Note: Only Master M2 and PhD students from Université Paris-Saclay are invited to contribute.

Important Dates

Contributions submission deadline: May 29th 2017 June 2nd 2017, midnight

Acceptance notification: July 3rd, 2017

Second call (posters only) - deadline: July 23rd 2017, midnight

Registration is open

Close registration: July 26th, 2017 September 4th, 2017
We know you might be in holidays right now, but the sooner you register the easier it is for us, so please take the time to fill the form!

Conference: September 14-15th, 2017


This second edition of the Paris-Saclay Junior Conference on Data Science and Engineering is addressed only to students and researchers at University Paris-Saclay. Registration is free but it is mandatory.

The main form is now closed, please fill out the late registration form.


Sarah Cohen-Boulakia
Chair, Université Paris-Sud, UPsay

Flora Jay
Co chair, Université Paris-Sud, UPsay

Florence d'Alché-Buc
Télécom ParisTech, UPsay

Sylvain Arlot
Université Paris-Sud, UPSay

Albert Bifet
Télécom ParisTech, UPsay

Isabelle Huteau

Victoria Perez de Laborda
Center For Data Science

Joseph Salmon
Télécom-ParisTech, UPsay

Karine Zeitouni
Université de Versailles-Saint-Quentin-en-Yvelines, UPsay

Samuel Delcourt
M2 DataScience, President of the Student Committee

Hamid Jalalzai
M2 DataScience, Vice-President of the Student Committee

Pierre Andrieu
M2 AMI2B, Member of the Student Committee

Laurent Cetinsoy
M2 AIC, Member of the Student Committee

Thomas Denecker
M2 AMI2B, Member of the Student Committee

Stéphanie Chevalier
M2 AMI2B, Member of the Student Committee

Pauline Fourgoux
M2 AMI2B, Member of the Student Committee

Nadège Guiglielmoni
M2 AMI2B, Member of the Student Committee

Pierre Merckaert
M2 AMI2B, Member of the Student Committee

Aris Tritas
M2 AIC, Member of the Student Committee

Program Committee

Florence d'Alché-Buc (Télécom ParisTech, LTCI)

Alexandre Allauzen (Université Paris-Sud, LIMSI)

Sylvain Arlot (Université Paris-Sud, LMO)

Nacera Bennacer (CentraleSupelec, E3S)

Albert Bifet (Télécom ParisTech, LTCI)

Christophe Blanchet (Institut Français de Bioinformatique)

Stephan Clémençon (Télécom ParisTech, LTCI)

Sarah Cohen-Boulakia (Université Paris-Sud, LRI)

Arnak Dalalyan (ENSAE ParisTech, CREST)

Juliette Dibie-Barthélemy (AgroParisTech/INRA, MMIP)

Cyril Furtlehner (Inria, Université Paris-Sud, LRI)

Christophe Giraud (Université Paris-Sud/Ecole Polytechnique, CPAM)

Alexandre Gramfort (Télécom ParisTech, LTCI)

Blaise Hanczar (Université d'Evry, IBISC)

Flora Jay (Université Paris-Sud/CNRS, LRI)

Balazs Kagl (Université Paris-Sud/CNRS, LAL)

Christine Keribin (Université Paris-Sud, LMO)

Erwan Le Pennec (Ecole Polytechnique, LIX)

Silviu Maniu (Université Paris-Sud, LRI)

Eric Moulines (Ecole Polytechnique, CMAP)

Nathalie Pernelle (Université Paris-Sud, LRI)

Emmanuel Pietriga (Inria, LRI)

Philippe Pucheral (Université Versailles Saint Quentin/Inria, DAVID)

Joseph Salmon (Télécom ParisTech, LTCI)

Marie Szafranski (ENSIEE, LaMME)

Fabian Suchanek (Télécom ParisTech, LTCI)

Xavier Tannier (Université Paris-Sud, LIMSI)

Arthur Tenenhaus (CentraleSupelec, L2S)

Karine Zeitouni (Université Versailles Saint Quentin, DAVID)

Pierre Zweigenbaum (Université Paris-Sud, LIMSI)

Berna Bakir Batu (Inria, LRI)

Takiy Berrandou (Inserm/UPS/UVSQ, CESP)

Jieying Chen (Université Paris-Sud, LRI)

Antonin Della Noce (CentraleSupelec, MICS)

Tom Dupre La Tour (Télécom ParisTech, LTCI)

François Gonard (Inria, LRI & IRT SystemX)

Alexandros Kontarinis (U Cergy Pontoise, ETIS & U Versailles Saint Quentin, DAVID)

Anna Korba (Télécom ParisTech, LTCI)

Mathurin Massias (Télécom ParisTech, LTCI)

Eugene Ndiaye (Télécom ParisTech, LTCI)

Joe Raad (MIA-Paris, AgroParisTech/INRA, MIA-PAris)

Thomas Rebele (Télécom ParisTech, LTCI)

Hoang Van Ha (Université Paris-Sud, LMO)

Partners and sponsors

This work was partially supported by a public grant as part of the Investissement d'avenir project (ANR-11-LABX-0056-LMH), and by Labex DigiCosme (ANR-11-LABEX-0045-DIGICOSME), operated by ANR as part of the program Investissement d'Avenir Idex Paris Saclay (ANR-11-IDEX-0003-02).


Laboratoire de l'Accélérateur Linéaire
Centre Scientifique d'Orsay
Bâtiment 200 - BP 34
91898 ORSAY Cedex