Data Science Seminars

When Nothing is easy: 
dealing with heterogeneous data and interoperability issues

11 march 2021 - 12h15-14h00

Online - registration mandatory



All over the world, data is being collected faster and in ever-increasing quantities every day, opening up unprecedented potential for a broad new access to knowledge. However, extracting knowledge from this data brings with it many technical and methodological challenges. These issues are obviously related to the volumes of these data sets, but they are also and above all induced by their complexity and heterogeneity. As such, it is often the question of data interoperability that stands out for many as an insurmountable or at least time-consuming obstacle. Occurring on a multitude of scales - international, internal to certain institutions or research groups, or even sometimes to individuals over time - there is no ‘one size fits all’ solution to heterogeneity, and the least that can be said is that in the face of these challenges, nothing is easy

Through concrete examples drawn from their research, the speakers at this seminar will share the challenges they have faced in managing and analyzing heterogeneous data, and the solutions they have developed to make interoperability possible. These presentations will underline that the very notion of heterogeneity does not embody the same situation everywhere and can be induced by very different epistemological conditions; notably marked by issues that sometimes go beyond purely scientific considerations, as it can be the case at the Geneva University Hospitals. The need for interoperability is also negotiated differently at the local and the global level, leading to the development of innovative solutions to allow the transition from one to the other, as it is the case for Earth Observation Data Cubes. Finally, once interoperability has been achieved, the question of the reproducibility of data integration processes arises. We will also see during this seminar that in the face of these ultimate challenges - sometimes taken into account (too) late in the research cycle - specific tools and methods can be of significant help.




Speak like a clinician: bridging the gap between controlled vocabularies and medical language

Christophe Gaudet-Blavignac, Faculty of Medicine, Department of radiology and medical informatics.

Interoperability is a well-known challenge in healthcare. Efforts to represent medical information go back to the nineteenth century. While numerous technical and semantic standards exist, the struggle to convert real world heterogeneous clinical data into actionable knowledge fit for sharing and secondary use is still a reality. Part of this challenge lies in the fact that the needs of the care givers in term of documentation and controlled vocabularies are not met by existing classification systems. Moreover, no medical classification can cover the complete field of medicine. To address this challenge a new approach must be proposed to bring semantic to clinical data.

The Division of Medical Information Sciences develops innovative methods in medical semantics focusing on applied sciences. Specifically, a new approach to semantically enhance clinical data while respecting the medical language used by clinicians and the results of its implementation in Geneva University Hospitals will be presented.


The Swiss Data Cube: enhancing Earth Observations data interoperability with Analysis Ready Data

Gregory Giuliani, Institute for environmental sciences.

The rapid changes in our environment and the emergence of big data, call for innovative solutions to support the policy frameworks and related actions toward sustainable development. To address these changes, the University of Geneva and the University of Zurich have joined forces with the UNEP/GRID — Geneva and the Swiss Federal Institute for Forest, Snow and Landscape Research (WSL) to unleash the power of Big Data for monitoring the Environment with a new technology: the Swiss Data Cube (SDC).

The SDC represents a new paradigm, revolutionizing the way users can interact with complex data originating from many different sources, in particular from Earth observing instruments, such as satellite imagery. It enhances connections between data, applications and users facilitating management, access and use of Analysis Ready Data (ARD). Analysis Ready Data enables data interoperability, facilitating reproducible analysis and harnessing big EO data at a minimum cost and effort.


Semantic web techniques to specify and implement data integration processes

Gilles Falquet, Geneva school of economics and management, Information science institute.

The development of increasingly sophisticated scientific and technical applications requires access to numerous and heterogeneous data sources. Despite the development of standards for data formats (CSV, XML, JSON, etc.) and for their conceptual description (XML schemas, UML diagrams, formal ontologies in RDFS and OWL), the integration or interconnection of data sources remains a complex process that requires a great deal of domain knowledge and the use of different software tools.

The Semantic Web offers a framework composed of a graphical data model (RDF), logical languages (RDFS and OWL) for conceptual modelling and logical inference, and a graph query and transformation language (SPARQL). Based on practical examples we will show that Semantic Web concepts and associated software tools can be used to formally specify and implement data integration processes, seen as graph transformations. 

By formally specifying these processes in a relatively simple way, we can expect a clear improvement in their reproducibility in other environments.