Finding solutions to methodological challenges through interdisciplinary collaboration
13 December 2021 - 12h15-13h15
Auditoire de Battelle and Online - Registration mandatory
The Data Science Competence Center (CCSD) of the University of Geneva is pleased to invite you to the third session of the Data Science Clinics.
The objective of the Data Science Clinics is to allow researchers to share with their peers some of the concrete challenges they encounter in the analysis of their research data. This sharing of methodological questions aims to collectively shape solutions to these challenges and to pave the way to the development of mutually beneficial collaborations between researchers.
Program
12h15-12h45: Jens-Kristian Krogager, Department of Astronomy, Faculty of Sciences.
"Graph Databases for Complex Astrophysical Data"
Astronomical spectroscopic surveys are becoming larger and larger and provide richer and more detailed data sets. This wealth of data allows us to identify a plethora of spectral features, which were previously hidden in the noise. In this presentation, I will highlight a specific case from my research into chemical enrichment of galaxies with the up-coming 4MOST survey instrument. Traditionally, such features are saved in relational databases; However, with more detailed data we are now able to identify rare features that only show up in less than 1% of our data. The traditional way to store and retrieve such features would be to create a dedicated 'feature table' which can be joined to its parent data set. With the expected amount of data, we will have many such rare features and the queries then require many table joins which become inefficient as the data set grows. Moreover, the fixed schema of relational databases are not easily extensible and thus may force us to think inside the box. My proposed solution is a switch to graph databases which allow more flexible data ingestion and queries. I would therefore like to hear from the experts how feasible this is on large scales and how this would compare to traditional relational databases.
12h45-13h15: Chiara Rizzi, Particle Physics Departement, Faculty of Sciences.
"CNN to identify and reconstruct di-photon events with the new FASER pre-shower"
The new FASER pre-shower is a proposed upgrade of the FASER experiment at the LHC, that will search for new long-lived particles. Currently, FASER is well equipped to detect particles decaying into charged leptons, but it is not able to identify particles decaying into two photons. The new pre-shower will enable exploring also this decay channel, greatly enlarging the set of models to which FASER will be sensitive. The baseline design for the pre-shower consists of six layers of converted and pixel detectors. Each pixel layer will register a snapshot of the evolution of the shower of the photons at that point of the detector. A machine learning-based approach to identify di-photon events, based on convolutional neural networks, is being investigated. This approach treats the problem as an imaging one, exploiting the similarities between the structure of the pixel layers and an image. This talk will describe the input data, the status and challenges of the current algorithm, and the prospects to use it also to perform a full reconstruction of the photons.
12h45-13h15: Ben Krikler, University of Bristol and RemotelyGreen.
"The RemotelyGreen Calculator: Estimating carbon emissions due to video conferencing"
Travel is a major contributor to global warming, and indeed, for many academic institutions the single most emitting activity. Replacing travel with an online meeting might seem like a natural way to reduce this, but exactly how much better can it be? Or even, are there ever occasions where travel, e.g. by train, might actually be environmentally better? Producing a reliable, accurate, open, and user-friendly tool to help make such comparisons is the goal of the RemotelyGreen Calculator which will, as a first step, estimate CO2 emissions for an online event, based on user inputs. When compared to similar results for travel, such a tool will give individuals, event organisers, and institutions the power to really make data-driven decisions on when to travel and when to go online. Making the calculator accurate requires knowledge of each users' setup; the IT infrastructure being used to run the event online; the emissions intensity of power production in relevant regions; and the embodied emissions associated with all the hardware. But this data is very hard to come by - either it's proprietary knowledge, out of date and incomplete, or inherently or intentionally innacurate. How can we build around these challenges while still meeting our goals? Is new data required? What open data sources exist?