Zurück
  • Poster presentation
  • P-II-0490

Optimizing search engines to detect marginal peptide matches using large-scale matched-matrix "truth" datasets

Termin

Datum:
Zeit:
Redezeit:
Diskussionszeit:
Ort / Stream:
New Technology: AI and Bioinformatics in Mass Spectrometry

Poster

Optimizing search engines to detect marginal peptide matches using large-scale matched-matrix "truth" datasets

Thema

  • New Technology: AI and Bioinformatics in Mass Spectrometry

Mitwirkende

Brian Searle (Columbus, OH / US)

Abstract

Detecting peptides with data-independent acquisition (DIA) analysis relies on search engines that identify signals associated with those peptides. Search engine algorithms have been developed using training datasets typically built using existing computer annotations, which can reinforce bias from older-generation search engines. One of the goals of new search engines is to correctly detect more peptides from marginal data, but these matches are also the most likely to include errors. Additionally, poor fragmentation or low signal peptide matches can also be the most challenging to curate manually. While large-scale peptide libraries such as ProteomeTools eliminate computational bias because they are built off of synthesized peptides, they also include biases from peptide selection and synthesis limitations.

We developed an approach to generate large-scale DIA "truth sets" for training search engines using matrix-matched calibration curves. Peptides in undiluted points in the calibration curve produce the highest signal and are the easiest for both existing computer engines and scientists to interpret. As those peptides are diluted into a matrix-matched background, increasing interference and decreasing signal make matches more challenging to validate. However, because the matrix stays constant, retention times and fragmentation patterns remain the same, allowing us to use accurate mass and retention time tags to confirm the signal of those peptides without needing detections from existing search engines.

To test this approach, we used a DIA-based matched-matrix curve generated with a linear ion trap (LIT). LITs are affordable and robust mass analyzers that provide fast scanning speeds with high sensitivity, where their only significant disadvantage is inferior mass accuracy compared to time-of-flight or Orbitrap mass analyzers. However, the decreased mass accuracy of LITs means that low signal-to-noise detections are challenging to measure confidently. We developed a rapid manual validation tool to confirm tens of thousands of detections of the undiluted sample and then used the dilution curve to provide hundreds of thousands of observations with lower signal-to-noise. We further demonstrate the benefits of new training sets on improving DIA scoring functions in EncyclopeDIA.

    • v1.20.0
    • © Conventus Congressmanagement & Marketing GmbH
    • Impressum
    • Datenschutz