Data Science Clinics

Data Science Clinics

11 February 2021 - 12h15-14h00

Online - registration mandatory


12h15-12h30: Julien Prados, Faculty of Medicine, Microbiology and Molecular Medicine.

"Embedded Data Analysis Support"

Numerical data analysis is becoming an essential tool for research labs. Life-scientists in particular depend on instruments generating high amount of data. Mastering the whole data analysis process within the lab is essential to efficiently query the data with all the domain-specific questions. Building on my experience, I will try to show the advantage of embedding the support of a data analyst within the research lab.


12h30-12h55: Béatrice Joyeux-Prunel, Faculty of Humanities, Digital Humanities.

"Data Science applied to Visual Globalization. The project Visual Contagions"

Images have participated in the cultural homogenization by which globalization is most often identified. But we are quite incapable of explaining how this homogenization has taken place; which images have circulated or been imitated the most in the past; according to which social, cultural, geographic channels; what were their success factors; and whether the global circulation of images has produced more homogenization than heterogeneity. 

Data science can be useful in addressing these questions, as we do in the project Visual Contagions (Swiss National Science Foundation, 2021-2025) and at the Imago Center (Label European Center of Excellence Jean Monnet, ENS/Beaux-Arts de Paris and UNIGE, 2019-2022). Starting with a digital corpus of illustrated printed material, we study the circulation of images over the long period (1890-1990), and on a global scale. What remains is to establish a relevant workflow: what infrastructure is the best to host our sources, to retrieve illustrations from illustrated pages, so as not to re-host data already made available by others? How can we minimize the computing time of our algorithms?  Can we envisage pattern descriptions that are interoperable and can be exchanged between projects that apply the same pattern recognition methods? Once the images have been described, how can we visualize their circulation in time, space, social and cultural environments?


12h55-13h20: Volodymyr Savchenko, Faculty of Science, Department of Astronomy.

"Interoperability and automation in searches for elusive multi-messenger transients"

In the last decade, it became possible to detect new kinds of signals from astrophysical objects: new telescopes yielded  first unambiguous observations of cosmic high-energy neutrino and gravitation wave emission. While these detections on their own were milestone discoveries, turns out that combining them with traditional telescopes allows to  tackle some of the most pressing problems in astrophysics and physics in general. Our team made key contributions to this domain with observations made by INTEGRAL observatory, one of the two telescopes to see the first counterpart of gravitational wave signal.

These efforts pose particular methodological challenges. First of all, telescopes catching different kinds of signals emerged from traditionally diverse domains, complicating any efforts to combine observations. In addition, turns out that (at least for now) the most prolific multi-messenger sources are short-lived (transient), require immediate action, and hence can only be treated with automated workflows, posing even more strict requirement on the synergy process.

The need to quickly and meaningfully communicate across often drastically different terminologies, data formats, programming languages, measurement systems, and statistical frameworks triggered dedicated efforts to adopt and develop semantic annotations and metadata. It also became necessary to advance stewardship of scientific data analysis workflows, enabling transferable analysis techniques, easily accessible and scaleable in cloud-based environments to respond to highly variable streams of astrophysical events.

On the other hand, despite the efforts to refine communication, multi-messenger observations still require researchers to deal with events outside of their direct expertise, calling for important decisions in a position of incomplete knowledge. This often leads to adoption of machine learning techniques.


13h20-13h45: Thomas M.M. Guibentif, Faculty of Sciences, Institute for Environmental Sciences and Department F.-A. Forel for environmental and aquatic sciences.

"Computation errors and elasticity effects: statistical analysis of energy efficiency measures data"

Energy efficiency is one main pillar of the energy transition. It's deployment aims at reducing total energy consumption so that a country's energy demand can be met with renewable sources. However, several limitations to this approach have been pointed out. On the one hand, the lack of standardization, accuracy and precision of saving estimation methods reduce the reliability of reported savings. On the other hand, elasticity of energy consumption has been shown to result in rebound effects, i.e. increases in the consumption triggered or enabled by energy efficiency deployment through e.g. reinvestment of cost savings or moral licensing, which will be reflected in metered savings.

We already used one data set from a regional, utility-driven energy efficiency sub-programme on lighting and developed a method to quantify the gap between reported and metered savings. This showed reported savings to be on average in the order of one third higher than metered savings, in line with other findings in the literature. Empirical adjustments of the data suggest that this gap arises primarily from a reduction of metered savings, due to real savings being offset by elasticity effects (linked to the measure implementation) and activity effects (unrelated to measure implementation). The share of both could not be quantified.

We so far based our analysis on empirical distribution plots of the gap between reported and metered savings, using different methods to assess reported savings. As a next step, we would like to perform a more rigorous statistical analysis of our data. Assumptions about the nature of error distributions would have to be formalized and justified as to determine the nature of the gap distributions, the parameters of which can be determined from our data. On this basis, we expect to precisely quantify the different systematic effects, or at least a relationship between them. This method could then be applied to other datasets to which we have access.