Liquid Chromatography coupled to tandem mass spectrometry (LC-MS/MS) is widely employed in proteomics studies. Most commonly, the tandem mass spectra (MS2) serve for identification, while MS1 is used for quantification. However, relying solely on MS2 introduces biases and results in incomplete coverage of the available signal in MS1. In addition, MS2 scans make up around 90% of all acquired scans, equivalent to ~75% of measurement time, however, yielding redundant information in multi-run experiments. To optimize the MS1 data utilization rate, we introduce Scan-Wise Activation and Peak Selection (SWAPS) for MS1-based joint identification and quantification of peptides, with the aim of reducing the dependency on MS2 scans, freeing up measurement time and increasing throughput.
In SWAPS, MS1 signals are de-convoluted in a scan-by-scan manner. For each MS1 scan, we construct a dictionary of potential observable peptides from either an in silico digestion or prior deep proteome measurements, described by their accurate mass, isotope pattern, predicted retention time (RT), charge state, and ion mobility (IM). The joint identification and quantification are formulated as a sparse coding problem, i.e. finding a sparse representation of the spectrum in the form of a linear combination of dictionary peptides, termed scan-wise activation. Scan-wise activations are concatenated for each peptide to form activation curves, which represent the inferred elution profile of a peptide. As a post-processing step, a Convolution Neural Network (CNN) is trained to detect and classify peaks in an activation curve in case multiple activation peaks are generated by the sparse coding.
Our results reveal that with accurate RT from single-shot experiments, the correlation between quantification outcomes from SWAPS (solely utilizing MS1 data) and MaxQuant (utilizing both MS1 and MS2 data) achieves a Pearson correlation coefficient of 0.95. Moreover, SWAPS successfully identified and quantified 82% of all peptide sequences detected in a corresponding deep proteome measurement, compared to only 28% identified in the same data when using MaxQuant. With predicted RT, SWAPS tends to recover multiple activation peaks for each peptide, leading to overestimation in peptide quantification. However, with the CNN-based peak selection model, the true peak can be identified for 85% of the peptides, resulting in a final Pearson correlation of 0.78. We are currently working towards further improvement on peak selection accuracy by integrating ion mobility, as well as charge distribution and ionization efficiency.
In conclusion, we highlight the feasibility and potential of leveraging MS1 information with SWAPS, demonstrating the potential of achieving higher throughput and higher measurement depth without additional experimental effort, enhancing efficiency in proteomics analyses.