Jens Settelmeier (Zurich / CH; Lausanne / CH), Sandra Goetze (Zurich / CH; Lausanne / CH), Julia Boshart (Zurich / CH), Jianbo Fu (Zurich / CH; Lausanne / CH), Sebastian N. Steiner (Zurich / CH), Martin Gesell (Zurich / CH), Peter J. Schüffler (Munich / DE), Diyora Salimova (Freiburg / DE), Patrick G. A. Pedrioli (Zurich / CH; Lausanne / CH), Bernd Wollscheid (Zurich / CH; Lausanne / CH)
Introduction
MultiOmicsAgent (MOAgent) is an innovative, Python based open-source tool for biomarker discovery, utilizing machine learning techniques specifically extreme gradient-boosted decision trees to process multi-omics data. With its cross-platform compatibility, user-oriented graphical interface and a well-documented API, MOAgent not only meets the needs of both coding professionals and those new to machine learning but also addresses common data analysis challenges like data incompleteness, class imbalances and data leakage between disjoint data splits. MOAgent"s guided data analysis strategy opens up data-driven insights from digitized clinical biospecimen cohorts and makes advanced data analysis accessible and reliable for a wide audience.
Method
A monte-carlo-like multi step approach reduces the initial features to a handalbe amount of the most phenotype-relevant features of quantitative omics expression tables for further downstream analysis, utilizing machine learning and sampling techniques. MOAgent provides a wide range of comprehensive output for further downstream analysis and interpretations. The most phenotype-predictive features are evaluated due their classification performance given by the decision threshold independent area under the receiver operating curve on a separate test set and Shapley (SHAP) values a widely used approach from cooperative game theory in explainable AI (XAI) that comes with desirable properties like fairness and efficiency.
Preliminary results
We applied MOAgent in five case studies, including metabolomics, transcriptomics and proteomics abundances, as well as combinations of their of. MOAgent retrieves well known disease relevant features and further features which play a significant role for the phenotype classification and thus potentially are included in crucial biological processes related to the investigated diseases.
Novel aspect and conclusion
MOAgent enables innovative integration of multi-omics data using machine learning and facilitates follow-up experiments involving the list of biomarker candidates provided.