Marie Locard-Paulet (Toulouse / FR), Nadezhda T. Doncheva (Copenhagen / DK), John H. Morris (San Francisco, CA / US), Lars Juhl Jensen (Copenhagen / DK)
In high-throughput mass spectrometry (MS), proteins are digested into peptides and the peptide MS signals are then used to infer protein relative quantities across samples. Proteins that cannot be unambiguously distinguished based on the available set of peptides are reported as protein groups containing several protein accessions. However, typical follow-up analysis such as gene-set enrichment and protein interaction networks are based on gene-level annotation. Thus, they can only be performed on single proteins or genes, rendering such analysis incompatible with protein group outputs. Currently, there is no best practice on how to handle this and its impact on functional analysis has not been studied yet.
Here, we investigate the composition of protein groups identified in 14 published proteomics data sets, including deep proteomes, phosphoproteomics data, single-cell proteomics, and pull downs from different species. We show that gene-set enrichment and network analysis can be affected to a different extent by the choice of which single protein is selected from each protein group, and that this selection should not be overlooked. To this end, we developed the Cytoscape app Proteo Visualizer that can complement the widely used stringApp by creating STRING networks from protein groups input instead of single protein accessions. In the resulting networks, each protein group is represented as a single node that inherits all existing edges of the group members, without discarding information from any protein accession. This app opens new avenues for performing network analysis with protein groups from bottom-up MS studies.