Data Science Seminars

Exploring data quality

11 november 2020 - 12h15-14h00

Online - registration mandatory

Abstract

 

Data quality is at the heart of any scientific process. In many ways, it conditions the relevance of research results. As such, researchers deploy multiple strategies to increase this quality, whether through the refinement of their data collection instruments or their methods of analysis. Data quality can thus be seen as a process of continuous improvement.

Through concrete examples drawn from their research, the speakers at this conference will propose keys to understanding and improving data quality, while critically questioning this very notion. Indeed, the notion of quality needs to be explored, while it appears in some respects to be shaped by specific scientific contexts and epistemological conditions. In this sense, exploring data quality, by challenging it in the light of inconvenient facts or mixed methods, has the potential to open fascinating avenues for research.

 

Program

 

A Statistician's View on Data Quality

Diego Kuonen, Geneva School of Economics and Management (GSEM), Research Center for Statistics.

What are data? What is quality? What is data quality?  What is (data) quality management? Which are the most important moments in a piece of data's life? What has this to do with continuous improvement? Why are statistical principles and rigour necessary? This presentation will cover these questions and more, based on real world experience and more than 19 years of professional statistical practice.

 

Bug or feature? The importance of data quality in physics

Martin Kunz, Faculty of Sciences, Geneva Cosmology and Astroparticle Physics Group.

In physics, experiments are carefully designed to deliver the best possible data in order to discover new phenomena. But of course this does not always work out precisely as planned. In my talk, I will take a look at the analysis of the European Planck satellite data to illustrate some strategies that we used to improve data quality and to help find problems. I will also briefly discuss how sometimes unexpected results lead to new discoveries, and how 'trust' in the data plays a crucial role in this context.

 

Data quality and data sets in the social sciences: thinking behind the scene

Juliet Fall, Geneva School of Social Sciences, Department of Geography and Environment

This contribution discusses how the processes and debates surrounding the design of a large database shed light on what quality might mean in the social sciences. It builds upon the research carried out within an international, multi-disciplinary team working across political science, geography and critical security studies. In order to analyse large sets of textual discourse using corpus linguistics, we built an open access database of diplomatic speeches and interventions at the United Nations Security Council (2007-2018) using automated pdf file conversions, web-scraping and manual verification. Despite successfully building a unique and powerful database, a number of crucial epistemological issues remain. Despite high data quality, there is a lot that simply cannot be captured by this type of data that claims to capture complex social and political processes: questions of translation and cultural and situated use of language, bodily performances, contexts, as well as what happened before these speeches were given, including closed-door preparatory meetings and behind-the-scenes negotiations.