David Teschner (Mainz / DE), David Gomez-Zepeda (Mainz / DE; Heidelberg / DE), Mateusz Krzysztof Łacki (Mainz / DE), Tim Maier (Mainz / DE), Stefan Tenzer (Mainz / DE; Heidelberg / DE), Andreas Hildebrandt (Mainz / DE)
Mass spectrometry continues to advance as a crucial technology for analyzing and quantifying the chemical composition of biological samples. Data-Independent Acquisition (DIA) techniques have become key for handling complex data from large-scale studies. The timsTOF platform has gained prominence in the field of DIA, utilizing ion-mobility separation to implement innovative acquisition strategies collectively known as Parallel Accumulation-Serial Fragmentation (PASEF).
The latest PASEF advancements offer expanded experimental possibilities but also necessitate the optimization of sample acquisition parameters by researchers. Additionally, these advancements call for the development of new data processing algorithms to fully leverage the rich information contained in the resulting raw data.
To address these challenges, we present TimSim, a simulation framework for generating in-silico proteomics data specific to timsTOF devices. TimSim reproduces the complete spectrum of raw proteomic mass spectrometry data generated by timsTOF instruments, allowing for thorough investigation of how different acquisition methods impact dataset information content. Furthermore, TimSim supports the generation of fully labeled data, down to individual peak annotations, providing a resource that complements wet-lab experimental data for the rapid development and evaluation of new data processing algorithms. We leverage a two-language approach, with lower-level Rust code implementing computationally demanding tasks such as frame generation but provide the tool with a user-friendly Python interface. Built with extensibility in mind, TimSim also enables the integration of emerging machine and deep learning models, aiding continued innovation in mass spectrometry data analysis.