Multiscale data geometric networks for learning representations and dynamics of biological systems

  • Bowden, Richard R. (CoI)
  • Saha, Avishkar A. (CoI)
  • Wolf, Guy (CoPI)
  • Krishnaswamy, Smita (PI)
  • Adelstein, Ian (CoPI)
  • Perlmutter, Michael (CoPI)

Project: Research

Project Details

Description

Recent years have seen significant volumes of high throughput, high dimensional, biomedical data arising from single cell sequencing technologies. Further, in contrast to past data of this type, current practices involve the collection of multiple single cell datasets (e.g., for each patient in a large cohort), which can represent data over time or in different conditions and can be analyzed computationally as point clouds. There is a great need for new mathematical and machine learning techniques to be able to process these types of complex data to gain meaningful and predictive insight on healthy and disease processes. While the majority of machine learning techniques used in the biomedical domain have been supervised techniques arising from language or vision (image) models, this project focuses on developing multiscale geometric and topological representations of more complex data structures. Such representations allow us to combine advances in several fields at the forefront of data science, including geometric deep learning, manifold learning, and harmonic analysis in order to analyze and predict from this data in an interpretable way. The research will include several biomedical applications, such as characterizing immune response in COVID-19, tracking the progress of metastatic cancer, predicting the effectiveness of immunotherapy, and understanding differentiation. Furthermore, the challenges addressed by these methods will enable new advances in a wide range of fields where complex high throughput data is collected in varying experimental environments. The project will provide representation learning techniques to explore and featurize high dimensional and complex datatypes including point clouds, graphs collected in a variety of conditions in order to perform machine learning tasks. Thrust 1 will involve the development of data geometric features to characterize point cloud data, based on which a novel class of neural networks will be created for regression on single cell data from a variety of systems. Thrust 2 will focus on methods for preserving directed information, by constructing asymmetric kernels and using these kernels for embedding, inference, and feature prediction. This will lead to the creation of directed graph neural networks that utilize geometric scattering as defined on a directional graph Laplacian of point cloud data. This will be used to learn and process data from gene regulatory and metabolic networks. Thrust 3 will focus here on inferring dynamics for interpolation of continuous dynamics from static snapshot single cell data using optimal transport-regularized neural ODEs and PDEs and producing interpretations of the underlying generative models. Further, it will involve representing dynamics quantitatively using data geometry and topology for prediction and classification, and will be validated on cancer and calcium signaling data from epithelial cells. The techniques developed here will provide fundamental advances in the use of neural networks to represent and make predictions on point cloud data, as well as enable new ways to tackle the problem of tracking dynamic biological processes over them.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
StatusActive
Effective start/end date1/01/2031/08/26

Funding

  • National Science Foundation: $498,229.00

Fingerprint

Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.