Liang Xue (Cambridge, MA / US), Mykola Bordyuh (Cambridge, MA / US), Djork-Arné Clevert (Berlin / DE), Robert Stanton (Cambridge, MA / US)
The identification of neo-peptide antigens is essential for the development of immunotherapies like chimeric antigen receptor T-cell therapy (CART), mRNA, and peptide vaccine. While mass spectrometry-based peptidomics is powerful for identifying peptides, its utility in discovery of non-canonical antigens is yet to achieve the full potential. Some challenges of such application include the constraints of the matching the resulting spectra against reference databases which may not represent the neo-peptide sequence, lower statistical power with exploded searching space, as well as low throughput due to computation power requirement.
Inspired by fast growing deep learning methods in various industry, we developed a general-purpose language model learning a joint representation of peptides and spectrum, and further applied such method to peptide identification for immunopeptidomics. Furthermore, the model was benchmarked with other machine learning models as well as classical de novo sequencing softwares.
Our general-purpose model demonstrated comparable results on neo-peptide identification against classical analytical methods and other machine learning based model, while offering speed, lower dependencies on heuristic decision, and the versatility towards downstream tasks.