Lukasz Szyrwiel (Berlin / DE), Justus L. Grossmann (Berlin / DE), Ludwig Roman Sinn (Berlin / DE), Vadim Demichev (Berlin / DE)
Data-independent acquisition (DIA) proteomics has recently seen rapid growth in popularity as a robust analytical tool in basic and translational research. New mass spectrometry instrumentation has pushed the boundaries of sensitivity and speed, and in tandem with novel acquisition schemes and advances in analysis software enables innovative applications, including spatial and single-cell proteomics. The high diversity of sample types and DIA methods calls for a reliable framework to evaluate and optimise the accuracy of protein quantification and ensure that the quantitative methods scale well to large experiments.
In this work, we designed and recorded a large-scale benchmark dataset to thoroughly examine the performance of quantification algorithms in DIA proteomics. The dataset contains a series of multi-species mixes similar to the LFQbench approach, but with the crucial difference that our design introduces a variable background to mimic natural proteome variation in a controlled fashion. The digest mixes were recorded in a high-throughput setting with fast gradients (200 - 500 samples/day) on an Evosep One system coupled to the latest-generation Bruker timsTOF Ultra mass spectrometer using dia-PASEF and slice-PASEF. Injection amounts ranged from 15 ng down to 0.75 ng for the human background matrix. Our dataset thus permits to investigate the effects of injection amount, chromatography method, acquisition scheme and sample heterogeneity on accuracy, precision and reproducibility of quantification. It further allows to ensure that the data processing algorithms used can cope with the challenges posed by large experiment size and sample diversity.
We use the new benchmark dataset to validate our machine learning-driven quantification framework QuantUMS (Quantification using an Uncertainty Minimising Solution), integrated in the DIA-NN software. We find that not only ultra-fast chromatographic gradients but also low sample amounts exasperate the inherent signal ratio compression in DIA proteomics. Next, we observe that QuantUMS uniquely alleviates the ratio compression, thus enabling comprehensive quantitative proteomics of low sample amounts at a throughput of hundreds of samples per day. Our results further show that QuantUMS enables reliable quantification irrespective of the dataset size or the presence of acquisitions with sample amounts spanning an order of magnitude within the same experiment, meeting the demands of the recent challenging applications of DIA proteomics.
In summary, we present a large-scale and diverse controlled benchmark data set acquired using the latest-generation mass spectrometry platform. This dataset allows to gain insights into the quantitative performance of DIA at the limits of instrument speed and sensitivity, as well as permits to evaluate, optimise and validate the next-generation computational methods for protein quantification.