Poster

  • P-MDE-005

The EDGAR platform for large-scale comparative genomics – recent developments and new features

Presented in

Poster Session 1

Poster topics

Authors

Max Pfister (Giessen / DE), Linda Fenske (Giessen / DE), Marius Dieckmann (Giessen / DE), Sebastian Beyvers (Giessen / DE), Aviral Jain (Giessen / DE), Jochen Blom (Giessen / DE), Alexander Goesmann (Giessen / DE)

Abstract

Introduction
EDGAR 3.2 provides precomputed orthology databases for more than 80,000 microbial genomes in public as well as private projects. The platform allows rapid identification of the differential gene content of kindred genomes, i.e., the pan genome, the core genome, or singleton genes. Furthermore, EDGAR provides a wide range of analyses and visualization features required for phylogenomic inter- and intraspecies taxonomic analyses, like the calculation of core-genome-based phylogenetic trees and genome similarity matrices (AAI, ANI, POCP, fastANI). Since the latest update, it also offers a functional categorization of genes based on the databases KEGG, COG and GO. Thus, the software enables a quick survey of evolutionary relationships and simplifies the process of obtaining new biological insights.

Objectives
Over the last decade, the average number of genomes analysed per EDGAR project has constantly increased. As a consequence, the goal of recent EDGAR developments was twofold: First, to create the necessary infrastructure and accompanying back-end processes to manage subsequent increases in the inflow of data. Second, the continued development of the EDGAR frontend, creating new ways for scientists to examine their genomes.

Methods
The backend for the calculation of orthologs and genomic subsets has been rewritten in Rust and is now deployed on an auto-scaling Kubernetes-Cluster. For the alignment workflow, BLAST was replaced by the much faster Diamond tool. Furthermore, we are currently iterating upon the functional-category features, extending them to enable users to interactively explore KEGG subcategories and compare the abundance of GO terms within their datasets.

Results
In EDGAR 3.2, functional category data was added for all genomes along with their respective visualization features. The technical infrastructure was further optimized to be scalable with increasing query sizes, and is currently used to process more than 30,000 genomes per year. The optimizations ensure that EDGAR 3.2 remains a convenient platform for comprehensive microbial gene content analysis.

The web server is accessible at: http://edgar3.computational.bio

    • v1.19.0
    • © Conventus Congressmanagement & Marketing GmbH
    • Imprint
    • Privacy