Giulia Capitoli (Vedano al Lambro / IT), Vanna Denti (Vedano al Lambro / IT), Michele Costanzo (Naples / IT), Lucia Santorelli (Pozzuoli / IT)
The constant innovation of novel measurement technologies has granted the availability of multiple omics information that can offer a comprehensive view of biological systems. Our research focuses on integrating multi-omics data, a crucial step to improve biomarker discovery and study the interplay of various molecular layers. Therefore, developing computationally scalable algorithms and statistical models that can provide accurate biological insights is of utmost importance.
Modern mass spectrometry (MS) techniques analyze the abundance of a wide range of molecules from a specific biological sample. The MS imaging (MSI) technique also generates imaging data, with each pixel containing a mass spectrum. This spatial information reveals the localization of biomolecules (including lipids, N-glycans, and tryptic peptides) within the tissue, enabling the identification of patterns that may not be visible in traditional morphological images.
Traditional statistical methods do not fully address the spatial dependencies of pixels and molecular networks. To overcome this, we developed a multi-omic co-clustering statistical model (based on Non-Negative Matrix Tri-factorization) that partitions the spatial expression profiles of multiple molecular layers. This model accounts for the spatial dependencies between neighboring pixels to infer the latent block structure of the data, resulting in two types of clustering: 1) clustering molecules based on their tissue-wide expression and 2) clustering image areas based on pixel coordinates.
Our approach allows for the integration of data from sequential analyses of various molecular classes from the same tissue, uncovering hidden patterns due to spatial correlations and relationships between molecular layers. We validated this method on murine brain and clear cell renal cell carcinoma (ccRCC) tissues, aiming to deepen our understanding of biomolecular functions and interactions in different tissue regions, which is crucial for comprehending key biological mechanisms.
This methodology is broadly applicable to any multi-omics approach that excludes spatial dependencies when handling MS data from solution samples. As a proof of concept, we tested the model on samples of hereditary metabolic diseases analyzed using Liquid Chromatography coupled with tandem mass spectrometry (LC-MS/MS). Specifically, we evaluated cellular models of methylmalonic acidemia by integrating proteomics, metabolomics, and lipidomics data to uncover specific molecular networks.
Additionally, we will integrate the biostatistical model into a web application for multi-omics data integration, providing a user-friendly interface accessible to the scientific community. This work serves as a proof of concept, laying the foundation for interpreting clinical variables in pathophysiological contexts.