Poster

  • P-II-0448

Mixed-species based computational simulation of proteoform groups

Beitrag in

New Technology: AI and Bioinformatics in Mass Spectrometry

Posterthemen

Mitwirkende

Isabell Bludau (Heidelberg / DE), Constantin Ammar (Martinsried / DE), Lukas Käll (Solna / SE), Veit Schwämmle (Odense / DK), Felix Sahm (Heidelberg / DE)

Abstract

Many genes are known to give rise to multiple different protein products, also called proteoforms, that differ in their primary amino acid sequence and associated modifications. Although this expansion of molecular diversity on the protein level is playing a crucial role for functional diversification, proteoforms are challenging to detect in classical bottom-up proteomics workflows and are typically neglected. While this facilitates downstream interpretability of results, it is also known to introduce biases and valuable information from the peptide-level is lost. To address this shortcoming, we and others have previously developed computational tools that leverage information of peptide expression across large datasets to infer proteoform groups. A key challenge in the development of such tools is the lack of available ground truth data that can be used for training and benchmarking.

Here, we propose a new strategy to computationally simulate proteoform data based on mixed-species experiments with various mixing ratios. The core concept is to create proteoform groups by combining peptides from proteins of the different species. Importantly, peptides shared between organisms mimic peptides shared between different proteoforms. The proposed approach conserves the natural variation and noise of proteomics measurements while also reflecting the key characteristics of differential proteoform expression.

To enable easy adoption by the community, we developed ProteoformMixer, an open-source Python library for mixed-species based proteoform simulation. Users can either take a deposited or load their own mixed-species data. Proteoforms are subsequently simulated according to user-defined characteristics such as the desired number of proteoform groups per protein and the peptide distribution across them. ProteoformMixer also offers basic benchmarking functionality. Users can load their proteoform grouping results to retrieve the sensitivity and specificity for overall detection as well as statistics on correct peptide assignment.

To showcase the strategy and functionality of ProteoformMixer, we utilized a previously published mixed-species dataset (Lou et al., 2023), comprising 35 DIA runs with 6 different mixing ratios of mouse and yeast proteomes. We simulated proteoform datasets with diverse characteristics to reflect a broad range of applications compatible with various proteoform detection algorithms, including PeCorA (Dermit et al., 2021), COPF (Bludau et al., 2021), and TPP-specific implementations (Kurzawa et al., 2023). While these tools provide an initial glimpse into the proteoform diversity detectable by bottom-up proteomics, further advancements are needed to enhance robustness, applicability, sensitivity, and specificity. We believe our new mixed-species based proteoform simulation approach will be a valuable model for training and benchmarking emerging proteoform detection tools, thereby supporting future developments in the field.

    • v1.20.0
    • © Conventus Congressmanagement & Marketing GmbH
    • Impressum
    • Datenschutz