Metaproteomics based on tandem mass spectrometry is a crucial tool for studying microbial communities. A major challenge in identifying microbial proteins from metaproteomics data lies in the extensive protein sequence search space, which significantly affects the accuracy and identification rate of microbial peptides. To overcome this issue, we developed a new workflow, DeepMeta, which integrates deep learning-predicted retention time and fragment ion intensity information to enhance the identification rate and accuracy of metaproteomic peptide identification.
We evaluated the performance of DeepMeta on three sets of in-silico datasets, including two sets generated by the entrapment strategy and one real-world simulated dataset. Applying DeepMeta to two search engines, Comet and MS-GF+, and comparing it with traditional one-step and two-step search methods, DeepMeta demonstrated an improvement of 25%-45% in the number of identified peptides. Under the condition of maintaining accuracy, the recall rate of DeepMeta averaged around 98%, whereas the traditional methods averaged only 85%-90%. The results obtained by DeepMeta also enhanced the intersection of identified peptides between the two search engines, further proving the accuracy of DeepMeta.
Applied to real metaproteomics data from deep-sea hydrothermal vents, DeepMeta significantly increased the peptide identification rate (90% for one-step and 76% for two-step) and uncovered new microbial communities in the deep-sea hydrothermal vents. Integrating deep learning prediction in DeepMeta improves peptide identification in metaproteomics and facilitates biological discoveries in microbial.