There is a growing need to make data from life sciences findable, accessible, interoperable, and reusable(FAIR), which is incredibly challenging in metagenomics. This field deals with the study of microorganisms' functional potential in natural, host-associated, and constructed environments. It uses high-throughput sequencing data from total DNA isolated from microbial communities. There is often a disconnection between wet labs and computational analysis in this field, leading to provenance issues among scientists.
This study aims to improve FAIR principle usage in metagenomics by creating guidelines to connect the wet and dry lab parts of projects in the context of genomic sequencing. To this end, we explored provenance in over 50 projects involving more than 1000 samples from multiple sources, i.e., bioreactors, agricultural and forest soils, freshwater, wastewater, and the gut microbiome. We organized the guidelines into five parts: sample preparation, sequencing logistics, data downloading, integrity and quality checking of sequences, and preprocessing the sequenced data until assembly.
First, we analyzed provenance from sample preparation (sample storage and DNA extraction) for metagenomics. Our data indicated that it is necessary to pay attention to the DNA yield and quality (particularly if interested in long-read sequencing) during sample preparation for sequencing. We observed that both low and high yields in the samples can lead to failed library preparation. DNA quality check must be performed in every sample in BioAnalyzer-like machines. We also highlight that memory configurations and resources required for sequencing projects to facilitate reproducibility must be considered, particularly for sequencing data preprocessing, as it demands a high usage of resources. Reviewing and commenting must be implemented on automatic reports in existing data processing pipelines to improve the interoperability and reuse of metagenomics data.
In Conclusion, small research groups and data stewards responsible for organizing data in local or large sequencing facilities may use our guidelines to bridge the gap between wet and dry lab researchers. Our policies may help to improve FAIR usage of metagenomics as we concentrate on data interoperability and reuse aspects in genome sequencing.
Auf unserem Internetauftritt verwenden wir Cookies. Bei Cookies handelt es sich um kleine (Text-)Dateien, die auf Ihrem Endgerät (z.B. Smartphone, Notebook, Tablet, PC) angelegt und gespeichert werden. Einige dieser Cookies sind technisch notwendig um die Webseite zu betreiben, andere Cookies dienen dazu die Funktionalität der Webseite zu erweitern oder zu Marketingzwecken. Abgesehen von den technisch notwendigen Cookies, steht es Ihnen frei Cookies beim Besuch unserer Webseite zuzulassen oder nicht.