Diego Fernando Garcia del Rio (Halle (Saale) / DE; Lille / FR), Amelie Bonnefond (Lille / FR), Isabelle Fournier (Lille / FR), Michel Salzet (Lille / FR), Tristan Cardon (Lille / FR)
Recent studies in bioanalytics, biology, and biomedicine have increasingly focused on alternative open reading frame (ORF)-encoded proteins (AltProts), in addition to the traditional reference proteins (RefProts). Previously overlooked, these proteins have also been classified as microproteins, cryptic proteins, and small ORF-encoded proteins. The OpenProt database reveals that AltProts originate from various sources, such as 3" and 5" UTRs, +1 and +2 frame shifts in the main ORF, and non-coding RNAs (ncRNAs). Overall, this ghost or un-referenced proteome enlarges the complexity of proteomes in all eukaryotic organisms.
Despite being a new field of research, advanced techniques like mass spectrometry-based proteomics, Next Generation Sequencing, RNA sequencing (RNAseq), and ribosome profiling have facilitated large-scale identification of these proteins. However, their functions remain largely unknown. Some are associated with cell regulation and development, while others have been linked to pathological processes, including cancer. Therefore, understanding their role in cancer-related molecular processes is vital.
In this study, we developed a methodology combining cross-linking MS (XL-MS), subcellular fractionation, and proteogenomics of two ovarian cancer cell lines (PEO-4 and SKOV-3 cells) and one normal ovarian epithelial cell line (T1074 cell). By using subcellular fractionation, we could reduce the complexity of the samples analyzed by liquid chromatography-tandem mass spectrometry (LC-MS/MS). With RNA-seq data, we generated customized protein databases for each cell line, which helped identify protein-protein interactions (PPIs). We validated these cross-linked PPIs by modeling the 3D structures of the AltProts, performing docking studies, and measuring the corresponding crosslink distances. Network analysis suggested potential roles for AltProts in biological functions and processes.
Our approach revealed an interaction between POLD3 and the AltProt IP_183088 which, after molecular docking, was located between POLD3-POLD2 binding sites, suggesting a possible role in DNA replication and repair. Other benefits of this workflow include non-targeted identification of 597 AltProts based on the predictions from RNAseq experiments, variant annotation, and the differential expression at their subcellular localization.
This study underscores the considerable potential of proteogenomics in revealing new aspects of ovarian cancer biology. It enables the identification of previously unexplored proteins and variants that may have functional significance. The use of customized protein databases and the crosslinking approach has shed light on the "ghost proteome," a hitherto unexplored area.