Poster

  • P-MDE-011

From Sequence to Function: A Comprehensive Evaluation of Prokaryotic Genome Annotation Pipelines Across Thousands of Genomes

Presented in

Poster Session 1

Poster topics

Authors

Mateusz Jundzill (Jena / DE), Riccardo Spott (Jena / DE), Mara Lohde (Jena / DE), Oliwia Makarewicz (Jena / DE), Mathias W. Pletz (Jena / DE), Christian Brandt (Jena / DE)

Abstract

Bacterial genome annotation is crucial for identifying genes, understanding bacterial biology, metabolic pathways, aiding in strain classification, or discovering novel treatment targets. Existing publications provide limited focus only on "benchmark organisms" and partial steps of the process, therefore lacking comprehensiveness.

We evaluated the tools' performance across the whole domain of Bacteria and Archaea by using all 14,675 different species (referential genomes) registered in Genome Taxonomy Database (GTDB). The stability of in-species annotation was checked on 24,385 Escherichia coli strains. The analysis was conducted on four popular annotation tools (Prokaryotic Genome Annotation Pipeline (PGAP), Prokka, Bakta, EggNOG-mapper). We annotated genomes with each tools' default or recommended settings, and the annotation quality was gauged by various metrics such as coding space, gene count, gene length, assigned GO terms, and feature count (e.g., rRNA, tRNA). Additionally, we simulated erroneous genomes with frameshifts by randomly deleting nucleotides (0.5%, 1%, 2%).

In comparison, Bakta annotates more coding space in Bacteria, but at some lower taxonomic ranks, other tools can outperform it. While in Archaea EggNOG-mapper and PGAP provide good coding space annotations. For metagenome-assembled samples (MAGs), PGAP performs better, most likely due to its taxonomic-specific annotation. For GO terms, EggNOG-mapper provides the highest count of GO terms per gene, while PGAP performs well in gene coverage with at least one GO term per feature. The simulated erroneous genomes showed that PGAP maintained stable performance.

Based on our findings, Bakta generally provides the most comprehensive annotation for the Bacteria domain. While in Archaea and MAGs, PGAP demonstrates superior performance. If functional GO annotation is important, EggNOG-mapper optimally balances GO term count while maintaining a reasonable count of hypothetical proteins. The performance of each tool may vary based on taxa, type of genome, correctness, and other factors. Nevertheless, the recommendations based on our findings apply to most use cases.

    • v1.20.0
    • © Conventus Congressmanagement & Marketing GmbH
    • Imprint
    • Privacy