Jonas Coelho Kasmanas (Leipzig / DE), Michael Schloter (Leipzig / DE), André C.P.L.F. de Carvalho (Leipzig / DE), Peter Stadler (Leipzig / DE), Ulisses Nunes da Rocha (Leipzig / DE)
Despite sequencing advances, characterized profiles of the human gut microbiome remain limited, hindering microbial community classification. Using dense deep clustering, we standardized 509,610 metagenome-assembled genomes (MAGs) to create comprehensive microbiome fingerprints. Our resulting interactive platform comes with AutoML-driven bioindicator detection, which can scan functional potential patterns across countries and disease conditions.
We selected 14,082 metagenomic runs (>20MM depth) from HumanMetagenomeDB (https://web.app.ufz.de/hmgdb/) and recovered 302,781 MAGs using MuDoGeR. Following this, we included and harmonized 154,736 MAGs from Pasolli et al. (Cell, 2019) and 60,675 from Nayfach et al. (Nature, 2019) to create a unified, standardized dataset. After dereplication, we identified 6,794 species. All MAGs underwent gene annotation using the ISfinder, NCBI-RefSeq, UniProt, and HMM databases available via Prokka. Next, we created a presence/absence gene profile for each MAG in adult gut samples (>18 years), yielding 426,648 profiles with 40,424 non-redundant genes. Next, we implemented an autoencoder with dense layers and a rectified linear unit (ReLU) activation function, followed by DenMune clustering to capture metagenomic fingerprints. We analyzed MAG distribution across taxonomy, geography, and host medical conditions. Finally, we developed an interactive platform with AutoML-driven bioindicator detection to identify potential key microbiome elements from the embedded space using the HumanMetagenomeDB curated metadata.
The embedded space revealed distinct microbiome profiles, showing significant differences between control and colorectal cancerous samples and separate libraries by country (PERMANOVA < 0.05). Additionally, the MAGs clusters based on functional potential identified taxonomical groups diverging from their majority cluster, which could signal relevant functional shifts in specific strains. Our AutoML bioindicator detection system uncovered geographical markers, with Lachnospira eligens, for instance, emerging as a promising bioindicator for differentiating Chinese and USA gut microbiomes. This study provided a standardized dataset, downloadable genome collection, and interactive webapp, expanding previously published resources for future microbiome research.
We use cookies on our website. Cookies are small (text) files that are created and stored on your device (e.g., smartphone, notebook, tablet, PC). Some of these cookies are technically necessary to operate the website, other cookies are used to extend the functionality of the website or for marketing purposes. Apart from the technically necessary cookies, you are free to allow or not allow cookies when visiting our website.