Zurück
  • Oral presentation
  • OP-65

PTMeXchange species specific PTM builds: meta-analysis of datasets and dissemination of high-quality PTM data for community use

Termin

Datum:
Zeit:
Redezeit:
Diskussionszeit:
Ort / Stream:
Plenary hall

Session

AI and Bioinformatics Approaches

Thema

  • Data Integration: With Bioinformatics to Biological Knowledge

Mitwirkende

Kerry Ramsbottom (Liverpool / GB), Ellen Boswell (Liverpool / GB), Oscar Martin Camacho (Liverpool / GB), Shireen Al-Momani (Liverpool / GB), Ananth Prakash (Cambridge / GB), Yasset Perez Riverol (Cambridge / GB), Zhi Sun (Seattle, WA / US), Deepti Kundu (Cambridge / GB), Emily Bowler-Barnett (Cambridge / GB), Maria Martin (Cambridge / GB), Jun Fan (Cambridge / GB), Eric Deutsch (Seattle, WA / US), Juan Vizcaino (Cambridge / GB), Andrew Jones (Liverpool / GB)

Abstract

Post-translational modifications (PTMs), of which phosphorylation is the most studied, play an important role in biological functions. Mass spectrometry (MS) and database searching are commonly used to detect and localise modification sites on proteins, with confidence being governed by peptide-spectrum match (PSM) and PTM localisation statistics. The aim of the PTMeXchange project is to re-analyse public enriched PTM datasets, focusing on accurate PTM localisation, integrating data across studies and disseminating the data to UniProtKB, linking it to the original MS evidence in PRIDE and PeptideAtlas in order to make PTM data FAIR (Findable, Accessible, Interoperable and Reusable). Here we demonstrate the re-analysis workflow using publicly available mass spectrometry proteomics datasets.

Publicly available mass spectrometry datasets from ProteomeXchange were selected and curated to identify relevant datasets for each investigated species and modification. An open data re-analysis pipeline using the Trans Proteomic Pipeline (TPP) and statistical methods to control PTM false localisation were used. The "PTM build" data from these are then integrated into public databases, including UniProtKB, PRIDE and PeptideAtlas. This enables detailed exploration of scores and visualisation of source mass spectra, as a full evidence trail. Here we demonstrate the pipeline for species specific PTM builds, including data from our first major build, phosphorylation in Asian rice (Oryza sativa).

For our first build, Oryza sativa, we identified eight relevant phosphoproteomics datasets. We preformed a simple meta-analysis combining all datasets, and assigning sites to categories based on their PTM localisation scores and occurrences in datasets: Gold-Silver-Bronze. This resulted in identifying 15,565 high-quality phosphosites on serine, threonine and tyrosine residues on rice proteins. We incorporated the use of pAla decoy identifications, enabling validation of false reporting within these categories. Only two pAla hits were observed in the Gold set, with 475 sites across all categories, indicating the overall FLR is very low.

We have now generated several phosphorylation builds for a variety of species including Plasmodium falciparum, Saccharomyces cerevisiae, Homo sapiens and Mus musculus. We are also covering other modifications, requiring more complex workflows, including SUMOylation, lysine acetylation and ubiquitination.

The re-analysed data for the completed builds have been loaded into UniProt Knowledge-Base - enabling researchers to visualise sites alongside other data on relevant proteins e.g. structural models from AlphaFold2, PeptideAtlas and the PRIDE database - enabling visualisation of source evidence, including scores. We also demonstrate how Universal Spectrum Identifiers (USIs) can be used to visualise mass spectra supporting modification sites and can be a valuable tool to investigate the evidence supporting identified modifications.

    • v1.20.0
    • © Conventus Congressmanagement & Marketing GmbH
    • Impressum
    • Datenschutz