Learning object representations to predict and explain missing associations

Carregando...
Imagem de Miniatura
Data
2025

Orientador(res)

Paccanaro, Alberto

Métricas

Título da Revista

ISSN da Revista

Título de Volume

Resumo
We consider a dataset of object associations (pairs), where each object belongs to one of two types (e.g. drugs and side effects). In some cases, additional information for the objects is also available (e.g. drug structures). Our aim is to learn object representations that can be used to predict missing associations and encode interpretable object attributes that can explain the predictions. In this thesis, we propose three different approaches and apply them to real-world datasets, primarily from pharmacology, which vary in the type of data available in each. Self-Matrix Factorization (SMF), is a method that learns object representations using solely association data. Exploiting the fact that, in general, objects lie in multiple linear manifolds embedded in high-dimensional space, SMF is able to learn similarities between objects---specifically, those that share a manifold---directly from the observed associations. Thus, SMF simultaneously learns object similarities and representations, constraining them to reflect underlying structures in the data. We tested SMF extensively on associations datasets containing user item ratings and drug side effect frequencies. Our results show that SMF outperforms competing methods in recovering missing associations and is also better at learning representations that capture meaningful object attributes. In our second learning scenario, no explicit associations between objects are available—only the perturbations that objects (drugs and viruses) induce in an environment (protein-protein interaction network). Our approach learns the object representations through simultaneous matrix decompositions of different matrices. We show these representations encode interpretable attributes of the objects involved (drugs, viruses, etc) that can be used to predict effective antiviral treatments and that these predictions are explainable in terms of the learned object attributes. Finally, in our third learning scenario, object associations (drug, side effects) are available together with some low level object features (drug molecular graphs). We developed an approach in which object representations are learned through a deep learning model, called Features to Signatures (F2S), and we show that these representations can be used to predict drug side effect frequencies from molecular graphs. Importantly, F2S can be used for ab initio prediction, to predict side effect frequencies for compounds with previously undetected side effects.

Descrição

Área do Conhecimento

Avaliação

Revisão

Suplementado Por

Referenciado Por