The histories of computing, artificial intelligence (AI) and mass spectrometry are intertwined, going back at least to the work of Sibyl Rock at Consolidated Engineering Corporation (CEC) compiling the first computing manual for mass spectrometry 1946, and together with Clifford Berry developing the first commercially available electronic computer. This work was driven by the need to solve problems in mass spectrometry. Another intersection is the development of an early expert system, Dendral, by Joshua Lederberg and colleagues at Stanford, for identifying chemical structures from mass spectra, representing a major milestone not only in mass spectrometry, but also in the development of AI.
In recent decades, machine learning, in particular deep neural networks trained on thousands or millions of spectra, has significantly improved our ability to identify peptides and small molecules by predicting their observability, chromatographic behavior and fragmentation in tandem mass spectrometry. This presentation will highlight some of these recent developments, including our own contributions. The integration of machine learned models can already be used to generate realistic synthetic data. This has many potentially beneficial applications in system suitability testing, experimental design and method optimization, and providing ground truth for algorithm and computational workflow benchmarking. It also has some significant risks, which must be addressed.
Large language models are now also able to learn from millions of documents describing mass spectrometers and mass spectrometry data may provide further assistance to human experts in data interpretation, resuming the pioneering work of the DCRT/CIS conversational AI mass spectrometry search developed at NIH in the early 1970s.
Finally, some recent work on using generative AI to explore data analysis workflows for mass spectrometry will be presented. By building and running ensembles of workflows rather than one particular workflow, it may be possible to increase the confidence in biological interpretations from the proteomics data, as well as learn something about the combined tools themselves.