Graph Neural Networks
11 November 2022 - 12h15-14h00
Uni Mail MR060 & Online
Registration mandatory - Under this link
Graph Neural Networks (GNNs) are a class of artificlal neural methods designed to perform inference on data that can be represented by graphs. Due to its convincing performance, GNN has become a widely applied graph analysis method recently. Indeed, graphs provide a simple yet powerful tool to describe complex systems. In simple terms, it consists of representing a problem as a set of objects (nodes) along with a set of interactions (edges) between pairs of these objects. Relevant application domains for GNNs include social networks, citation networks, molecular biology and physics, where they have proven that they can actually lead to more accurate and robust models.
Through concrete examples drawn from their research, the speakers at this seminar will present their use of GNNs, and some of their associated limitations. These presentations will notably allow, through the example of the Large Hadron Collider (LHC), to better understand some of the key advantages of GNNs over other ML models.They will also show, relying on applications to clinical trial risk assessment, how the way the input graph is constructed impacts the predictive power. Finally, through concrete examples related to philogenetic profiling, these presentations will also point out how wide can be the applications of the GNNs, as they can be deployed on heterogeneous graphs including interaction, evolutionary and functional information to represent protein families and their relationships.
Program
Graph-based data representation & transformers in particle physics
Tobias Golling, Faculty of Science, Department of Particle Physics (DPNC)
Modern machine learning (ML) leverages the incorporation of domain knowledge directly in the model building for performant, interpretable and reusable ML modules. Most real-world data, including many physical systems, are unordered and have variable size. A graph supports arbitrary relational structure between unordered so-called nodes of arbitrary size. Information is communicated through the presence or absence of pairwise connections (edges). These models combine two key advantages over other ML models: they allow to imprint the physics in the form of the very flexible structure of graphs and they provide additional flexibility in the representation of properties of nodes, edges, and the graph as a whole. Applications in state-of-the-art particle physics research at CERN’s LHC from the ATLAS experiment will be exemplified to demonstrate the potential of graph data representation and transformers as one particularly powerful realisation.
Hierarchical document classification using graph neural networks: an application to clinical trial risk assessment
Douglas Teodoro, Faculty of Medicine, Department of Radiology and Medical Informatics
We consider the hierarchical representation of documents as graphs and use geometric deep learning to classify them into different categories. While GNNs can efficiently handle the variable structure of hierarchical documents using permutation-invariant message passing operations, we show that we can achieve additional performance improvements using a selective graph pooling operation that arises from the fact that some parts of the hierarchy are invariant across different documents. We applied our model to predict the risk of clinical trials (CTs) based on the design protocol, achieving robust predictive performance on a large-scale publicly available CT dataset of approximately 360K protocols. We also show that while the use of GNNs to solve this classification task is successful, the way the input graph is constructed impacts the predictive power and that it may be beneficial to make a priori useful information more explicit. We further demonstrate how the graph model can be exploited to obtain insights into the CT risk factors.
Phylogenies and coevolution as a graph labeling problem
Ana Claudia Sima, Research Scientist at the SIB Swiss Institute of Bioinformatics
David Moi, DBC at the University of Lausanne and the Dessimoz lab
Phylogenetic profiling tools are often used to study protein-protein interaction networks without using in-vivo data. These computational approaches compare the evolutionary signatures of two protein families to determine if they have the same trajectory. This coevolutionary signal often indicates an underlying interaction of biological importance between two protein families. By representing the phylogeny of both families as directed acyclic graphs we can take this approach one step further and train CGNs to distinguish likely interactions from non-interacting pairs of families as well as determine where in the species tree the two are interacting. In addition to this project which has seen favorable preliminary results, we have begun work on deploying CGNs on heterogeneous graphs including interaction, evolutionary and functional information to represent protein families and their relationships. We have the goal of augmenting these RDF based representations spread out among multiple databases with CGN inferred features across SIB resources.