In contrast to conventional closed database searches, open modification searches make it possible to search for hundreds or even thousands of different post-translational modifications (PTMs) in a single search, potentially increasing the identification rate of mass spectra. However, the inclusion of numerous PTMs exponentially expands the search space, leading to lengthy run times and a higher risk of matching false positive hits. Here we present MS Andrea, a new open modification search engine that utilizes sequence tags to reduce the number of peptide candidates to tackle the expanding search space. MS Andrea is able to accurately identify peptides and their PTMs, allowing insight into the modification landscape of proteins.
The first step of the MS Andrea search algorithm is spectral pre-processing, encompassing the removal of precursor peaks and standard deconvolution. Subsequently, peaks are selected from 100 m/z windows in the spectrum. Using these picked peaks, sequence tags with a length of three and four residues are extracted from each spectrum. Peptides from the protein database are then ordered according to their sequence tags making it possible to extract only those peptides that contain at least one sequence tag found in the spectrum. This peptide candidate pool is then further filtered using a wide precursor mass tolerance and subsequently scored in two rounds using our in-house developed MS Amanda scoring function. In the first search round all the peptide candidates that remain after filtering based on the sequence tags and the precursor tolerance are matched to the spectrum, considering only fixed modifications. Subsequently, in the second search round, the ten highest scoring peptides are re-scored, this time considering combinations of PTMs that correspond to the delta mass between the uncharged precursor mass and the mass of the matched peptide. Currently, it is possible to identify up to two different types of variable modifications per peptide with MS Andrea.
To evaluate the performance of MS Andrea, we conducted an analysis on two different datasets—phosphopeptides and peptides derived from HeLa cells. We compared the identification results with those obtained using the well-established closed search engine MS Amanda. Initial findings indicate a substantial overlap in peptide-spectrum matches between the two search engines across both datasets at 1% estimated false discovery rate. Our ongoing efforts involve continuous refinement and development of MS Andrea to achieve more efficient and accurate identification of modified peptides. This also includes comparing MS Andrea to other open modification search engines such as MS Fragger.