Zhendong Liang (Shenzhen / CN), Tianze Ling (Beijing / CN), Tingpeng Yang (Shenzhen / CN), Yonghong He (Shenzhen / CN), Yu Wang (Shenzhen / CN), Cheng Chang (Beijing / CN)
De novo peptide sequencing is essential for identifying novel proteins, yet its broader application is constrained by the lack of a robust quality control system. In response, we have developed a transformer-based model, π-xNovo, which significantly enhances sequencing accuracy. Through a detailed analysis of the model's attention matrix, we have elucidated the contribution of mass spectrometry peaks to amino acid predictions. Leveraging these insights, we designed the π-xNovo-QCS system, which discern peptide predictions with accuracies exceeding 80% and sensitivities above 90%. Applying this model to a large-scale deep human proteome dataset resulted in the identification of 1,931,761 additional peptides, marking a 137% increase over traditional database search results. These newly identified peptides facilitated a 17.9% increase in protein identification, a 23.59% increase in the detection of single amino acid polymorphism events, and a 20.02% increase in exon-skipping splicing events. The deployment of this explainable AI system holds significant potential for expanding the application of de novo peptide sequencing, particularly in exploring the darker matter of the human proteome universe.