Junior Conference on Data Science and Engineering
Paris-Saclay, 15-16 September 2016


This first edition of the Paris-Saclay Junior Conference on Data Science and Engineering is addressed to PhD students in their first year, M2 students and third year students at Engineering schools at University Paris-Saclay. It will offer these students the opportunity to present their scientific works developed at internships, or in the first year of thesis, and also to grow their critical sense inside a real conference with prestigious invited Keynotes.



Thursday 15 September 2016

9:30am - 10:30am Morning Session I
Chair: Sarah Cohen-Boulakia (LRI, CNRS, UPSud)
9:30am - 9:35am Introduction Florence d'Alché-Buc and Albert Bifet (LTCI, Télécom ParisTech)
9:35am - 10:35am Keynote Speaker: Patrick Valduriez (INRIA, Montpellier, France) - From Databases to Data Science: impact on information systems.

10:35am - 11:00am Coffee break

11:00am – 12:40pm Morning Session II
Chair: Jacob Montiel (LTCI CNRS, Télécom ParisTech)
11:00am - 11:25am AstroSpark - Towards a Distributed Data Server for Big Data in Astronomy, Mariem Brahem, Karine Zeitouni, Stephane Lopes and Laurent Yeh (UVSQ).
11:25am - 11:50am Modelling and Identifying Inter-Individual Plant Variability, Antonin Della Noce, Véronique Letort, Charlotte Baey and Paul-Henry Cournède (MICS, CentraleSupelec).
11:50am - 12:15pm Casie: Comparing Annotation Systems for Information Extraction, Thomas Rebele, Katerina Tzompanaki and Fabian M. Suchanek (LTCI CNRS, Télécom ParisTech).
12:15am - 12:40pm Deep NN for Job Offer/CV Matching, François Gonard, Marc Schoenauer and Michèle Sebag (LRI CNRS, UPSud, INRIA Saclay)

12:50pm - 2:30pm Lunch at PROTO 204

2:30pm – 3:30pm Afternoon Session I
Chair: Joseph Salmon (LTCI CNRS, Télécom ParisTech)
2:30pm – 3:30pm Keynote speaker: Hervé Jegou (FAIR (Facebook), Paris, France) - Research in Artificial Intelligence at Facebook

3:30pm – 4:30pm – Poster Session and coffee break

4:30pm – 6:10pm Afternoon Session II – Statistical Learning and optimization
Chair: Eugene Ndiaye (LTCI CNRS, Télécom ParisTech)
4:30 pm-4:55 pm Learning Hyperparameters for Unsupervised Anomaly Detection, Albert Thomas, Stéphan Clémençon, Vincent Feuillard and Alexandre Gramfort (LTCI CNRS Télécom ParisTech).
4:55 pm-5:20 pm Frank-Wolfe Algorithms for Saddle Point Problems, Gauthier Gidel, Tony Jebara and Simon Lacoste-Julien (MVA, INRIA Paris)
5:20 pm-5:45 pm Structured prediction with Fisher Ouput Embedding, Moussab Djerrab, Maxime Sangnier and Florence d'Alché-Buc (LTCI, Télécom ParisTech)
5:45 pm – 6:10 pm Capacity constrained optimal transport for domain adaptation, Stanislas Chambon, Gilles Wainrib, Mathieu Galtier and Alexandre Gramfort (LTCI, Télécom ParisTech).

Friday 16 September 2016

9:30am - 10:30am Morning Session I
Chair : Sylvain Arlot, (UPSud)
9:30am – 9:40am – EIT Digital Doctoral School talk
9:40am - 10:40am Keynote speaker: Gabor Lugosi (Dept of economics, Pompeu Fabra University, Barcelona, Spain), Finding Adam in randomly growing trees

10:40am - 11:00am Coffee break

11:00am – 12:40pm Morning Session II
Chair: Claire Vernade (LTCI CNRS, Télécom ParisTech)
11:00am - 11:25am Controlling the distance to Kemeny consensus without computing it, Yunlong Jiao, Anna Korba* and Eric Sibony.(*LTCI, Télécom ParisTech)
11:25am - 11:50am V1-bit Matrix Completion: PAC-Bayesian Analysis of a Variational Approximation, Vincent Cottet and Pierre Alquier (ENSAE ParisTech)
11:50am - 12:15pm Discovery of Topic-Relevant Content on the Web, Aris Tritas and Odalric-Ambrym Maillard (INRIA Saclay, LRI, Université Paris-Sud).
12:15pm - 12:40pm Gap Safe Screening Rules for Sparse-Group Lasso, Eugene Ndiaye, Olivier Fercoq, Alexandre Gramfort and Joseph Salmon (LTCI, Télécom ParisTech)

12:40am - 2:30pm Lunch at PROTO 204

2:30pm – 3:30pm Afternoon Session I
Chair: Florence d'Alché-Buc
2:30 pm-3:30 pm Keynote speaker: Isabelle Guyon (LRI CNRS, INRIA Saclay) – Quels nouveaux défis en Science des Données?

3:30pm – 4:30pm – Poster Session and coffee break

4:30pm – 6:20pm Afternoon Session II
Chair: Raef Mousheimish (UVSQ)
4:30pm – 4:55pm Spatio-Temporal trajectories for neurodegenerative diseases, Igor Koval and Stanley Durrelman (M2 Data Science, INRIA Saclay).
4:55pm - 5:20pm Deduplication and Record Linkage with Big Data Frameworks, Joaquin Melgarejo and Mickael Patron (LTCI CNRS, Télécom ParisTech).
5:20pm - 5:45pm Investigating spatial and social dimension of epidemic dynamics. Basile Calderan and Stefania Rubrichi (M2 Data Science, Orange Labs).
5:45pm - 6:10pm Structured prediction of argumentative structures from User-Generated Web Discourse, Alexandre Garcia, Chloé Clavel, Slim Essid and Florence d'Alché-Buc, (M2 MVA, LTCI CNRS, Télécom ParisTech).
6:10pm - 6:20pm Wrap-up and Closure


From Databases to Data Science: impact on information systems
Patrick Valduriez

Abstract: Data has been quoted as the new oil, to reflect that big data can be turned into high-value information and new knowledge. Although data analysis has been around for a while, starting with statistics and evolving lately into exploratory data analysis, data mining and business intelligence, the new dimensions of big data (volume, variety, velocity, etc.) make it very hard to process and analyze data online, and derive good conclusions. In particular, relational DBMSs, which are at the heart of any information system, have been lately criticized for their “one size fits all” approach. Although they have been able to integrate support for all kinds of data (e.g., multimedia objects, XML and JSON documents and new functions), this has resulted in a loss of performance and flexibility for new data-intensive applications. To address this grand challenge, data science is emerging as a new science that combines data management, statistics and machine learning, visualization and human-computer interactions to collect, clean, integrate, analyze and visualize big data. The ultimate goal is to create new data products and services, as well as training legions of data scientists. In this talk, I will introduce data science, in relation to databases, and discuss its impact on information systems. I will also illustrate the main opportunities and risks, in particular by telling my favorite stories about the good, the bad and the ugly.

Research in Artificial Intelligence at Facebook
Hervé Jegou

Abstract: I will first present some of the research activity carried out at the Artificial Intelligence laboratory of Facebook, namely FAIR. Our research covers very different topics, such as computer vision, natural language processing, reasoning, optimization and large-scale processing. I will then present a specific work on similarity search, namely Polysemous codes (joint work with Matthijs Douze and Florent Perronnin). It addresses the problem of efficiently searching in a billion-sized collections of high-dimensional vectors, representing for instance images. I will conclude by a discussion comparing my experience of research in academia and at Facebook.

Finding Adam in randomly growing trees
Gabor Lugosi

Abstract: Many large networks are formed by dynamic growth that can be modelled by simple random mechanisms. A natural problem is to discover the "past" of such networks by merely observing their present state. This talk addresses one of the simplest such problems one may formulate: find the the first vertex in large random trees generated by either the uniform attachment or preferential attachment model. We study algorithms that, upon observing the tree, output a set of K vertices. We require that, with probability at least 1 – ε, the first vertex is in this set. We show that for any ε, there exist such algorithms with K independent of the size of the tree.
The talk is based on joint work with Seb Bubeck and Luc Devroye.

Quels nouveaux défis en Science des Données?
Isabelle Guyon

Abstract: Alors qu’il y a quelques années on avait peine à expliquer en dehors d’un cercle d’avertis ce qu’est le “machine learning”, aujourd’hui tout scientifique qui se respecte semble en avoir entendu parler et voudrait devenir expert en “sciences des données”. Hier l’Intelligence Artificielle (IA) mettait en avant la logique des prédicats tentait d’éradiquer les réseaux de neurones, aujourd’hui les promoteurs du “deep learning”, héritiers des réseaux de neurones, sont les chefs de file de la nouvelle IA. Qu’est-ce que demain nous reserve? Cet exposé propose un regard différent sous l’angle des applications et invite le nouvelle génération à relever les défis liés à l’automatisation des taches, à l’”interprétabilité” des machines douées d’apprentissage et de leurs décisions, et aux interfaces homme-machine.

Call for submissions

We invite Master M2 and PhD students from Université Paris-Saclay to submit an extended abstract of up to 2 pages describing new or preliminary results in order to present their work in one of the two forms: oral talk or poster. Submissions should be formatted according to the Springer Lecture Notes in Computer Science style. Each extended abstract must be submitted online via the Easychair submission system:


The topics of the conference are listed below:

  • databases
  • big data analytics
  • workflows
  • distributed data and computing
  • data mining
  • machine learning
  • statistics
  • semantic web
  • large scale data
  • applications of data science (image, documents, audio, video, biomedical and biological data, on-line advertisement, physics, chemistry, smart cities...)

The deadline for submissions for oral and poster presentations is August 19th. 2016

Update: A second call for submission is open, deadline September 5th. 2016

The extended abstracts will be reviewed by the scientific program committee. They will be selected for oral or poster presentation according to their originality and relevance to the conference topics. Electronic versions of the extended abstracts will be accessible to the participants prior to the conference, distributed in hardcopy form to participants at the conference, and will be made publicly available on the conference web site after the conference. However, the book of abstracts will not be published and the extended abstracts will not constitute a formal publication.

Note: Only Master M2 and PhD students from Université Paris-Saclay are invited to contribute.

Important Dates

Contributions Submission deadline: August 19th, 2016

Acceptance notification: August 26th, 2016

Second Contributions Submission deadline: September 5th, 2016

Second Acceptance notification: September 7th, 2016

Conference: September 15-16th, 2016


This first edition of the Paris-Saclay Junior Conference on Data Science and Engineering is addressed only to students and researchers at University Paris-Saclay. Registration is free but it is mandatory.

Please fill out the registration form provided here.


Florence d'Alché
Chair, Télécom ParisTech, UPsay

Albert Bifet
Chair, Télécom ParisTech, UPsay

Sylvain Arlot
Université Paris-Sud, UPSay

Sarah Cohen Boulakia
Université Paris-Sud, UPsay

Arnak Dalalyan
ENSAE ParisTech, UPsay

Christophe Giraud
Université Paris-Sud, UPsay

Joseph Salmon
Télécom-ParisTech, UPsay

Karine Zeitouni

Isabelle Huteau

Program Committee

Talel Abdessalem (Télécom ParisTech)

Alexandre Allauzen (Université Paris-Sud)

Sylvain Arlot (Université Paris-Sud)

Albert Bifet (Télécom ParisTech)

Sarah Cohen Boulakia (Université Paris-Sud)

Stephan Clémençon (Télécom ParisTech)

Arnak Dalalyan (ENSAE ParisTech)

Florence d'Alché (Télécom ParisTech)

Christophe Giraud (Université Paris-Sud)

Blaise Hanczar (Université d’Evry)

Erwan Le Pennec (Ecole Polytechnique)

Yohan Petetin (Télécom SudParis)

Silviu Maniu (Université Paris-Sud)

Eric Moulines (Ecole Polytechnique)

Joseph Salmon (Télécom ParisTech)

Pierre Senellart (Télécom ParisTech)

Fabian Suchanek (Télécom ParisTech)

Arthur Tenenhaus (CentraleSupelec)

Fariza Tahi (Université d'Evry)

Karine Zeitouni (USVQ)



Laboratoire de l'Accélérateur Linéaire
Centre Scientifique d'Orsay
Bâtiment 200 - BP 34
91898 ORSAY Cédex