Data Science Clinics

Data Science Clinics

11 December 2020 - 12h15-14h00

Online - registration mandatory



12h15-12h30: Jean-Luc Falcone, faculty of sciences, department of informatics

"Computing for data science: common issues and common solutions"

Is your research depending on a bunch of scripts written by a post-doc who left the lab two years ago ? Have you run a successful proof of concept with a limited data set, but are unable to exploit real scale data due to performance problems ? Do you spend several hours every week by manually converting data between softwares in order to analyse your new results ?

As data processing is becoming pervasive in almost all research domains, scientist are getting always more dependent on software and computing infrastructure. If researchers have access to a large catalog of available software, it is often essential to develop new program and algorithms, or at least adapt and update legacy code.  The large number of required skills, associated with the rapid turnover of ideas, approaches, makes this problem even more difficult to solve.

The fields of computer science and sofware engineering provide several tools helpful to tackle data science problems: from theoretical fundations like algorithmics or formal languages, to applied skills like parallel and distributed computing.  Collaborations with computer scientists may help researchers to make better software: more robust, maintenable, and reproducible.  Notably, it may allow to scale up in order to adress larger problems, by using more efficient methods and programs, and more powerful architectures.


12h30-12h55: moupiya maji, faculty of Sciences, department of astronomy.

"building a model to predict the sources of cosmic reionization"

When our Universe was only a billion years old (current age 13.7 billion years) galaxies were still just forming and the Universe was full of neutral hydrogen gas. Around this time the hot stars in these first galaxies emitted an enormous amount of energetic photons and these photons ionized the ubiquitous neutral hydrogen. This milestone event is known as the reionization of the Universe and is an important part of our cosmic history. However, it is not possible to directly observe these ionizing photons as they are all destroyed or absorbed on their long journey to us (almost 200 billion light-years). In our project, we look for other signatures of these ionizing sources to understand how this reionization event unfolded.

Our team performed a large scale simulation of the early Universe and identified the hot stars and the first galaxies around this distant era. We then track the ionizing photons and other emissions from hydrogen and find that the Lyman alpha emission (photon emitted when electron in H atom jumps from first excited state to ground state ) from hydrogen in galaxies and the ionizing photons emitted from stars in galaxies are correlated. Unlike the ionizing radiation of galaxies, the Lyman alpha emission from the same galaxies can be observed in this distant era, so we can use it as a signature of ionizing radiation. However, the correlation is not straightforward at all. 
So we try to build a model, that can predict the ionizing emission from a distant galaxy, given the properties of galaxies that can be observed, such as its mass, size, and Lyman alpha emission. We have a big dataset of about 2000 galaxies, for each of which we have a set of known properties. So far, we have used multiple linear regression to build such a model and it performs reasonably well. However, there
are some issues with the degeneracy of the parameters (multicollinearity) and the accuracy of the predictions.

In this clinic, we ask for advice from the community, if there is a better method for building such a predictive model.

12h55-13h20: Benoit girard, faculty of medicine, department of basic neurosciences.

" Computational approaches for social ethology in rodents "

Our society is affected by various neuropsychiatric disorders characterized by social deficits. While naturalistic behaviors have a rich, complex and dynamic repertoire, most current clinical and preclinical research is based on restricted and redundant paradigms defined by a few easily quantifiable and well-characterized behaviors. In the laboratory of Prof. Camilla Bellone, we are working on the elucidation of neural circuits and molecular mechanisms underlying neuropsychiatric disorders and, more particularly, autism spectrum disorder. We plan to use artificial intelligence technologies that involve the use of machine learning approaches to process high-dimensional data in order to establish a deep behavioural profile of social behaviours in preclinical model associated with psychiatric disorders.

Current advances in computer vision and machine learning offer new possibilities for a deep analysis of social behaviours starting from recorded video. From a data point of view, pose estimation allows the extraction of a complex three-dimensional skeleton composed of moving vectors. But this is only a partial solution to the problem of analyzing and classifying complex social behaviors.

To limit the use of unreliable, non-reproducible, non-scalable and imprecise human interventions, we propose to develop an unsupervised approach for the automatic computational ethology of social behaviors. Detailed ethograms of social behaviors are essential not only to advance basic knowledges, but also to assess and categorize psychiatric disorders and to develop brain-machine interfaces. This technology will support neuroscientists and clinician-researchers to assess social behaviours across species, fields and diseases. In addition, this approach could be applied in various fields involving the analysis of discrete states in multivariate time series.

Although the laboratory has a strong knowledge and network in neurobiology, animal behaviour and clinical psychiatry, collaboration with complementary fields represents a strong advantage for this project. Insights from zoologist/ethologist would be an asset and, more strikingly, computer scientists specialized in computer vision and deep learning approach would be a must to accelerate the development of reliable and accurate models. .


13h20-13h45: Elliot romano, Faculty of Sciences, Department F.-A. Forel for environmental and aquatic sciences.

" HOROCARBON : Assessing real-time of electricity consumption in Switzerland and abroad "

Various assessment methods exist to evaluate the carbon footprints related to electricity consumption, they usually rely on green certification, or average mix values. The current methodologies therefore lack accuracy when one comes to estimate the true value of carbon intensity related to electricity consumption in an open economy, interconnected with the European grid. Based on recent academic work, the carbon-meter project is of particular relevance for assessing the impact of electricity driven processes with a strong seasonal consumption pattern such as in the building (e.g. heat pumps) or mobility (batteries) sectors. The project will provide hourly values of carbon footprint of the Swiss electricity consumption, and accounts for imports and transits from neighboring countries. Moreover, the methodology is based on a an approach which is appropriate to render the impact of electricity market integration.

The project is based on effective exploitation of new real-time data. Big data solution is a main concern in this project as continuous data acquisition from generation plants, transits, imports and electricity demand in Switzerland and abroad is required. Results on carbon footprints will be available for everyone to access on a dedicated digital platform.  They can thus be used as a signal by the different stakeholder in order to coordinate their respective action with respect to carbon savings.

  • Industry:
    • Invest in most-efficient carbon-savings technologies 
    • Inform consumers on the real-impact of their consumption
  • Policy makers: 
    • Adopt appropriate environmental policy according to the observed carbon-intensity
    • Provide financial subsidies according to CO2 signals
  • Academics:
    • Benchmark the carbon-saving efficiency of new electricity driven process. Provide simulation to academics on CO2 emissions (JASM)

While our current research focus on the carbon footprint at the Swiss domestic level taking into account imports/exports  with its direct neighbours, our future challenge is to provide a global picture of the carbon footprint tied to electricity consumption for each European country. For such future work, multiple interactions in the assessment due to imports and transits need to be considered.