Tanja Ziesmann (Mainz / DE), Maximilian Sprang (Mainz / DE; Kaiserslautern / DE), Dana Hein (Mainz / DE), David Gomez-Zepeda (Mainz / DE; Heidelberg / DE), Stefan Tenzer (Mainz / DE; Heidelberg / DE), Ute Distler (Mainz / DE)
Background: Proteomics data acquired through mass spectrometry needs further processing to be human readable. However, the output tables produced by processing software can require large amount of resources, may not be readable with Excel, and might include information not required for further analysis, such as creating plots or doing statistical analysis. Creating a workflow customised to a lab"s need helps solving this problem.
Methods: Data acquired using timsTOF (Bruker Daltonics) or Orbitrap (Thermo Fisher Scientific) instruments were analysed with the processing software DIA-NN, MaxQuant, or Spectronaut. The resulting tabular outputs in software specific formats were further analysed with a downstream processing pipeline programmed in Python and R. The user provides a design file, linking the instrument file with the experiment condition. Using the information provided by the user in the config file the pipeline executes the desired analyses. Most analysis steps are separate from each other and can be chosen individually, however some plots are dependent on the results of the statistical analysis, e.g. t-test, as they visualise the significantly enriched proteins.
Results: With minimal user input, the pipeline unifies the different output formats and produces standardized output formats as well as an html-report. The report includes the description of the different plots and output files, as well as information about the settings used for the statistical analysis. This report is easily shareable between collaboration partners. The pipeline is built modular, so the user can either obtain a standardized tabular output or additional plots and tables as specified in the config file. The user can chose among different statistical tests for either pairwise or groupwise comparisons. The plots include volcano plots, pca plots, UpSet plots, violin plots and more. Other outputs include an Excel file consisting of the filtered output in wide format for easy use in e.g. R, another Excel sheet containing the results of the statistical tests for each protein and comparison. Additional filters can be applied, e.g. for the removal of contaminants of proteins with low peptide support.
Conclusions: Our pipeline takes the output tables of different processing software and creates a unified table useful for further processing, as well as a detailed report, including several different plots, descriptions, and statistical analysis. It allows a fast analysis and visualization of proteomic datasets, reducing user time to a minimum.