Leron Kok (Utrecht / NL), Eric Deutsch (Seattle, WA / US), Michal Bassani-Sternberg (Lausanne / CH), Jyoti Choudhary (London / GB), Ivo Fierro-Monti (Cambridge / GB), Robert L. Moritz (Seattle, WA / US), Jonathan Mudge (Cambridge / GB), John Prensner (Ann Arbor, MI / US), Jorge Ruiz-Orera (Berlin / DE), Nicola Ternette (Oxford / GB; Utrecht / NL), Juan Antonia (Cambridge / GB), Sebastiaan van Heesch (Utrecht / NL)
We and others have recently demonstrated the widespread translation of thousands of short non-canonical open reading frames (ncORFs). Translation of ncORFs is frequent in cancer and indicative of protein production, which could drastically expand the proteome. This "dark proteome" has gained interest as a source of new tumor-specific antigens for immunotherapeutic targeting. However, most ncORFs are currently not supported by high-confidence protein-level evidence. To address this, we established an international consortium with experts from PeptideAtlas, HUPO-HIPP, GENCODE, and the Ribo-seq ORF Consortium. We reanalyzed 3.8 billion raw spectra of 414 different studies comprising 95,520 MS runs using the PeptideAtlas workflow for the presence of 7,264 ncORFs. Focusing on a subset of 118 immunopeptidomics datasets comprising 240M spectra from 10,095 MS runs, we could confirm the peptide level presence of 1,785 out of 7,264 ncORFs. Of these ncORFs, 39% was supported by at least two unique HLA peptides, and 348 could be qualified as Tier 1B "Presented" protein candidates according to our previously proposed non-canonical protein annotation guidelines (Prensner et al., MCP 2023). We made several observations based on our analyses. First, we demonstrate that the detected ncORF peptides show strong concordance (95%) with in silico HLA binding predictions and are mostly processed from ncORF-derived microproteins with a high isoelectric point. Second, we noticed that certain HLA alleles presented more ncORF peptides than others. We could link this directly to the enrichment of certain amino acids in ncORFs that precisely matched the anchor residues of specific HLA alleles. This could indicate that cancer patients carrying such HLA alleles might be more susceptible to future immunotherapeutic strategies targeting antigens derived from ncORFs. Our analyses provide reference-annotation quality and manually curated evidence for the HLA presentation of ncORFs, providing critical insights that can aid in advancing immunotherapy development against this class of antigens.