• Poster presentation
  • P-II-0733

Proteogenomics identifies conserved and lineage-specific novel small proteins in clinical Mycobacterium tuberculosis reference strains

Appointment

Date:
Time:
Talk time:
Discussion time:
Location / Stream:
Infectious Biology Insights

Topic

  • Infectious Biology Insights

Abstract

Objectives: Accurate and comprehensive prediction of all protein coding genes is still an unresolved issue. Genes encoding small proteins are often missed in genome annotations, even in well-studied bacterial model organisms such as Escherichia coli [1] and Pseudomonas aeruginosa [2]. Yet, these small proteins can carry out many important functions [3]. Using proteogenomics, we aim to identify protein coding genes missed by conventional genome annotation methods in six Mycobacterium tuberculosis clinical reference strains from lineages 1 and 2. M. tuberculosis is among the top bacterial infectious diseases world-wide, with the modern lineage 2 being especially virulent.

Methods: We de novo assembled complete genomes of six clinical reference strains from long read PacBio data. By hierarchically integrating reference annotations, ab initio gene predictions and a modified six-frame translation that considers alternative start codons, we created large but minimally redundant integrated proteogenomics search databases (iPtgxDB) [4], where ~95% of peptides unambiguously identify one protein [5]. Total cell extracts were analyzed with Parallel Accumulation-Serial Fragmentation (PASEF) mass spectrometry (MS). After strict FDR filtering and validation we prioritized novel small proteins based on functional predictions and conservation.

Results: The assembly of complete genomes allowed us to overcome drawbacks of fragmented, short-read based assemblies that can even miss essential genes [2]. Comparative genomics identified a large core genome (94% of genes), reflecting the lack of horizontal gene transfer in Mycobacterium. Proteogenomics allowed us to identify 35 novel proteins which were enriched in proteins shorter than 100 amino acids and had a significantly higher average of pI values than annotated proteins. Interestingly, the novel candidates included strain and lineage specific proteins and three candidates with a predicted function in toxin-antitoxin systems.

Conclusion: We successfully applied our proteogenomics framework for prokaroytes, which is available as a public web server (https://iptgxdb.expasy.org) [3], to six clinical reference strains of a major human pathogen [7]. By leveraging state of the art tandem MS we were able to identify novel small proteins with potential roles in pathogenicity without the need for sub-cellular fractionation.

References

[1]. Storz G, et al. EcoSal Plus 2020, 9:10.

[2]. Varadarajan AR et al. NPJ Biofilms & Microbiomes 2020, 6:46.

[3]. Storz G, et al. Annu Rev Biochem 2014, 83: 753-777.

[4]. Omasits U et al. Genome Res 2017, 27: 2083-2095.

[5]. Qeli E & Ahrens CH. Nat Biotechnol 2010, 28:647-650.

[7]. Heiniger B et al. (in preparation).