Thang V. Pham (Amsterdam / NL), Robin Richardson (Amsterdam / NL), Alex Henneman (Amsterdam / NL), Connie Jimenez (Amsterdam / NL)
Introduction
A mass spectral library is typically required for analysis of data-independent acquisition mass spectrometry (DIA-MS) data. It has been shown that a deep neural network can learn from experimental data to predict all essential components of a spectral library including mass spectra, retention time, and ion mobility from input peptides sequences. However, the use of different deep learning models makes it hard to use in practice. Hence, we aim to develop a software tool that supports all components of a spectral library.
Methods
We use a transformer-based deep learning model for all prediction tasks. The transformer model enables parallel processing with high accuracy and has demonstrated state-of-the-art performance for a wide range of problems, including retention time prediction of phosphopeptides (Pham et al. Proteomics 2023 Apr;23(7-8):e2200041). We develop a spectral library generation module to produce a spectral library that can be readily used by the DIA-MS processing tool DIA-NN (Demichev et al. Nat Methods. 2020 Jan;17(1):41-44).
Results
The open-source Python package aiproteomics, available at https://github.com/aiproteomics/aiproteomics, is being developed with a view to sustainability through application of best practices for research software. It is easily installable using the standard Python package manager, pip. Tutorial notebooks are provided for each task to provide a quick start for users. The output spectral library can be directly used by DIA-NN for DIA-MS data processing. Finally, we are benchmarking the accuracy of the transformer models against state-of-the-art methods for DIA-MS-based phosphoproteomics.
Conclusions
The open-source Python package aiproteomics enables spectral library generation for DIA-MS data analysis using state-of-the-art transformer-based deep learning models. We also provide a guideline for contribution from the community.