Back
  • Oral presentation
  • OP-37

End-to-end transfer learning in DIA proteomics for the characterization of unseen peptide modifications

Appointment

Date:
Time:
Talk time:
Discussion time:
Location / Stream:
Conference room 1-2

Session

AI and bioinformatics in mass spectrometry

Topic

  • New Technology: AI and Bioinformatics in Mass Spectrometry

Authors

Georg Wallmann (Munich / DE), Marvin Thielert (Munich / DE), Tim Heymann (Munich / DE), Mohamed Kotb (Munich / DE), Xie-Xuan Zhou (Munich / DE), Wen-Feng Zeng (Munich / DE), Matthias Mann (Munich / DE)

Abstract

Data independent acquisition (DIA) based proteomics has established itself as key for fast and deep characterization of complex proteomes. Identification of peptides and proteins from DIA data is commonly performed using deep-learning generated spectral libraries which were trained with a large corpus of known spectral matches. DIA search is therefore challenged when the properties of unseen post translation modifications or instrument setups should be predicted. Based on our recently published modular open-source search framework alphaDIA, we have introduced a new strategy for DIA search termed end-to-end transfer learning.

AlphaDIA is an open-source framework for complete DIA search workflows. It builds on the scientific python stack and the alphaX ecosystem allowing flexible search strategies as well as default workflows accessible through a Python API, Jupyter notebooks, a command line interface. AlphaDIA encompasses the entire workflow, from raw file processing to protein quantity reporting, and supports files and proprietary formats from major vendors. Engineered for streamlined processing of large cohorts and diverse data sizes, it operates seamlessly on Windows, Linux, and Mac platforms, or can be deployed in a distributed manner in the cloud using Slurm or Docker.

By integrating deep-learning prediction using our deep-learning framework AlphaPeptDeep and with DIA search, custom models are trained for specific experimental setups. Following an initial search, confident identifications are re-quantified and their fragment spectra are collected. Our robust and automated deep-learning pipeline optimizes the neural network for retention time, ion mobility, charge state and fragment spectra prediction while ensuring generalization through a held out test set. In a second search alphaDIA uses this trained model to increase identifications and decrease false discoveries.

We show the potential of this technology by applying it to chemical as well as post translational modifications. In our experiments, end-to-end transfer learning can increase the number of identified precursors drastically by close to 50% percent, leading to more than 25% additional protein identifications. Rigorous benchmarking established that the effects aren"t the result of overfitting but reflect generalization of the network to the new data distribution. To unlock the potential, we show the potential application to acetylated peptide discovery, HLA-peptide characterization and non-tryptic digests.

    • v1.20.0
    • © Conventus Congressmanagement & Marketing GmbH
    • Imprint
    • Privacy