Zurück
  • Poster presentation
  • P-II-0446

Tree-based quantification retrieves proteoforms from bottom-up proteomics data

Termin

Datum:
Zeit:
Redezeit:
Diskussionszeit:
Ort / Stream:
New Technology: AI and Bioinformatics in Mass Spectrometry

Poster

Tree-based quantification retrieves proteoforms from bottom-up proteomics data

Thema

  • New Technology: AI and Bioinformatics in Mass Spectrometry

Mitwirkende

Constantin Ammar (Planegg / DE), Marvin Thielert (Planegg / DE), Eugenia Voytik (Planegg / DE), Caroline Weiss (Planegg / DE), Edwin Rodriguez (Planegg / DE), Maximilian T. Strauss (Copenhagen / DK), Florian Rosenberger (Planegg / DE), Wen-Feng Zeng (Planegg / DE), Matthias Mann (Planegg / DE)

Abstract

Recent algorithmic improvements in mass spectrometry (MS)-based proteomics have greatly increased the number of identifications. However, less attention has focused on proper quantification, which arguably provides the ultimate value of a proteomics experiment. We argue that much of the potential to improve quantification is still untapped. In particular: 1) Much of the data that could contribute to quantification is neglected and condensed into single "protein quantity" values. 2) Comparisons between linked ions (e.g. fragment ions vs. MS1 precursor ions) enable the extraction of rich information that is underutilized. 3) Information about the acquisition such as peak quality and retention time is currently not integrated into the quantitative feature selection. 4) Peptides of a protein potentially belong to different proteoforms and should be separately quantified but this is generally not the case.

We present AlphaQuant, an end-to-end software pipeline, which offers novel solutions for the above issues. AlphaQuant introduces a "tree-based quantification" approach, where all data is organized into a hierarchical tree which systematically integrates information along the different levels of quantification from fragment ions, MS1 isotopes, charge states, modifications, peptides, proteins to proteoforms. Quantities at the fragment and precursor level are integrated with information about the experimental design and acquisition. Rigorous statistical analyses are performed on all available data, boosting statistical power. Clustering along the tree allows us to assess the consistency along its different levels. To optimally integrate this information, we perform machine learning on features derived from the tree structure.

For differential expression analysis, we show massively boosted sensitivity in proteome, phosphoproteome as well as single cell data with up to 50-fold increases compared to a current state-of-the-art approach. We further address the notoriously difficult challenge of scoring quantification accuracy (as opposed to precision), i.e. whether the fold change of a peptide reflects the true abundance change. We benchmark on a mixed-species dataset and show superior classification abilities compared to standard metrics, removing biased peptides with an AUC of 0.94. We furthermore statistically compare and cluster peptides with similar quantitative behavior within our tree, allowing us to resolve proteoform profiles from bottom-up proteomics data. We combine this with deep learning classification of peptide sequences to infer regulated phospho-peptides from non-enriched proteome data alone. Finally, we apply our method to tissue measurements and shed light on the global proteoform diversity of the mouse proteome.

AlphaQuant will be distributed as an open-source Python package within the AlphaX ecosystem, containing a graphical user interface and a one-click installer.

    • v1.20.0
    • © Conventus Congressmanagement & Marketing GmbH
    • Impressum
    • Datenschutz