Alexander Hogrebe (Berlin / DE), Michael Graber (Berlin / DE), Daniel Zolg (Berlin / DE), Markus Schneider (Berlin / DE), Tobias Schmidt (Berlin / DE), Vishal Sukumar (Berlin / DE), Florian Seefried (Berlin / DE), Siegfried Gessulat (Berlin / DE), Martin Frejno (Berlin / DE)
Background: Modifications (PTMs), particularly phosphorylation, are commonplace in cell signaling studies, and precise PTM site localization is pivotal for understanding underlying biology. Current localization tools mainly rely on peak m/z matching, neglecting fragment ion intensity differences between modified peptide isomers. Leveraging our deep-learning framework, we accurately predict physicochemical properties of modified peptides, facilitating intensity-based localization of modification sites. We assess various scoring methods and compare our approach with existing tools.
Methods: We curated a vast training dataset (>1.6 million spectra) of post-translationally modified peptides to develop a precise model for predicting their properties. Our Chimerys search algorithm employs these predictions to score peptide spectrum matches (PSMs) from experimental MS2 spectra, aiming to explain maximum MS2 intensity with minimal peptides. For localization, we predict spectra for all possible modification isomers simultaneously, calculating normalized scores and localization probabilities for each potential site.
Results: Previously, we showed that our deep-learning model accurately predicts fragment ion intensities of phospho-peptides, which can be leveraged for identification and deconvolution of MS2 spectra in the Chimerys search algorithm. Here, we extend its capabilities to the localization of PTM sites in DDA and DIA data, exemplified by but not limited to phosphorylation.
First, we designed a score that combines information on matched peaks with fragment ion intensity information. Calculating this score on the most intense b- and y-ions as well as neutral losses was sufficient for the accurate localization of PTM sites. In a DDA ground truth data set of purified phosphorylated peptides, the intensity-based scoring outperforms PTMProphet and MaxQuant at identifying known localizations. Here, the algorithm correctly localized 2% and 6% more overlapping PSMs at 99% precision, respectively. On this data set, ptmRS did not reach 99% precision.
We then investigated how intensity-based scoring behaves in more complex samples, such as a publicly available data set of synthetic phospho-peptides, spiked into a complex yeast phospho-peptide background. Here, the algorithm correctly localizes 8% more overlapping PSMs at 99% precision than MaxQuant in DDA and 5% more overlapping precursors than DIA-NN in DIA. When analyzing all PSMs, we identify 32% more correctly localized PSMs than the FragPipe-PTMProphet pipeline. In both DDA and DIA, ptmRS did not reach 99% precision.
In conclusion, we integrated intensity-based localization in Chimerys, obsoleting the need to pair Chimerys with ptmRS. The algorithm is compatible with DDA and DIA data analysis, providing a powerful new tool for PTM research.
Conclusions: We present a novel algorithm utilizing predicted fragment ion intensities for the intensity-based localization of PTM sites in DIA and DDA data.