Back
  • Oral presentation
  • OP-04

An integrated landscape of mRNA and protein isoforms

Appointment

Date:
Time:
Talk time:
Discussion time:
Location / Stream:
Conference room 1-2

Session

Multiomics approaches and data integration

Topic

  • Multiomics Approaches

Authors

Henrik Zauber (Berlin / DE), Amir Kedan (Berlin / DE), Mengran Wang (Berlin / DE), Wei Chen (Berlin / DE), Matthias Selbach (Berlin / DE)

Abstract

Cellular processes such as alternative splicing (AS) and proteolytic processing generate various protein isoforms from a single protein-coding gene. Studies using high-throughput sequencing indicate that > 95% of multi-exon genes in humans undergo AS [1,2]. In addition, proteolytic processing contributes to proteome complexity and is involved in processes like cell cycle regulation, cell signaling, and apoptosis. Current proteomic methods are not well suited to detect protein isoforms. Standard shotgun (bottom-up) proteomics involves digesting proteins into peptides. Although this approach identifies many proteins, it results in a loss of isoform information [3]. In contrast, mass spectrometric analysis of intact proteins (top-down proteomics) can distinguish protein isoforms but only covers a small subset of the proteome [4]. Here, we developed peptide correlation profiling (PepCP) as a method to obtain protein-level information from peptide-centric (bottom-up proteomic) data: First, proteins are fractionated by SDS-PAGE into polypeptides of different lengths. Second, individual protein fractions are digested into peptides. Third, peptides are identified and quantified across all fractions using quantitative mass spectrometry-based proteomics. Finally, peptide abundance profiles across fractions are analyzed to obtain protein-level information. We combined PepCP with long-read RNA sequencing to provide an integrated landscape of mRNA and protein isoforms in human RPE-1 cells. We established a reference set of 45,223 full-length transcripts and developed a computational framework to automatically detect different proteoforms at both the mRNA level (32612 protein isoforms from 12,400 unique genes) and protein level (16,983 protein isoforms from 8,168 unique genes), providing the largest available integrated dataset. Our data captures many well-known protein isoforms created by AS, alternative translation, and protein processing. Most of the detectable alternatively spliced transcripts are indeed detected at the protein level. In addition, we provide evidence for numerous new protein isoforms presumably generated by protein processing that cannot be observed at the mRNA level. Our data provides a unique resource of integrated mRNA and protein isoforms and paves the way for a deeper understanding of proteome complexity.


1. Pan, Q. et al. Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing. Nat Genet 40, 1413–1415 (2008).
2. Wang, E. T. et al. Alternative isoform regulation in human tissue transcriptomes. Nature 456, 470–476 (2008).
3. Sinitcyn, P. et al. Global detection of human variants and isoforms by deep proteome sequencing. Nat Biotechnol 41, 1776–1786 (2023).
4. Tran, J. C. et al. Mapping intact protein isoforms in discovery mode using top-down proteomics. Nature 480, 254-258 (2011).

    • v1.20.0
    • © Conventus Congressmanagement & Marketing GmbH
    • Imprint
    • Privacy