Back
  • Abstract talk
  • WS1.005

Electronic lab notebooks compiling big datasets for machine learning analysis

Appointment

Date:
Time:
Talk time:
Discussion time:
Location / Stream:
chromium

Session

Data management

Topic

  • Workshop 1: Data management

Authors

Martin Held (Geesthacht / DE), Catriona Eschke (Geesthacht / DE), Fabian Kirchner (Geesthacht / DE), Milosz Meller (Geesthacht / DE), Sayed Ahmad Sahim (Geesthacht / DE), Nicole Jung (Karlsruhe / DE), Nick Garabedian (Karlsruhe / DE), Ilia Bagov (Karlsruhe / DE), Christian Greiner (Karlsruhe / DE), Frederic Bock (Geesthacht / DE), Benjamin Klusemann (Geesthacht / DE), Florian Wieland (Geesthacht / DE), Tomer Fried (Geesthacht / DE), Roland C. Aydin (Geesthacht / DE), Fabian Isensee (Heidelberg / DE)

Abstract

Abstract text (incl. figure legends and references)

Electronic lab notebooks (ELNs) serve as means to gather analog metadata, e.g. experimental parameters, that would otherwise be hard to digitalize. Furthermore, they are a key prerequisite to a comprehensive documentation of research processes and the reuse of research data. However, different systems are often used within the same research institution or community, especially when covering a long, interdisciplinary process chain. The use of different systems - each addressing distinct requirements for discipline-specific needs - enables the availability of a broad functionality but results in challenges due to an often missing interoperability of the metadata. We are addressing this lack of interoperability for the two ELNs Herbie and Chemotion with an API-based data exchange. The envisaged reduction of boundaries between the disciplines of chemical synthesis and process engineering in membrane research will enable the creation of large and coherent data sets, including microscopic image data.

Once data management platforms, e.g. ELNs, are in use, the semantic information therein can only be exploited when terms are clearly defined in a glossary. In order to make metadata truly interoperable, glossaries need to be transformed into ontologies, which structure knowledge based on formal logic, reduce ambiguity and offer machine-driven reasoning. We are establishing a glossary and ontology for membrane research, in parallel to a guide that describes the best practices for building these. In combination with existing efforts, e.g. the Helmholtz Electron Microscopy Glossary, our ontology will enable machine-driven interpretation of the linguistic data in our ELNs.

Following the assembly of large datasets in ELNs, we address the ambiguous relationship between pore morphology and fabrication parameters of isoporous block copolymer membranes by machine learning methods. At first, the hexagonality was extracted from 1800 SEM images as a quality descriptor. Analyzing the impact of the fabrication parameters on this descriptor with a neural network revealed that the fraction of the main solvent and the molecular weight have the most pronounced effect on the hexagonal arrangement of pores in the membrane.

In summary, using ELNs in the entire experimental workflow creates holistic, structured datasets that can easily highlight critical parameters when analyzed by machine learning methods. Moreover, enhancing the value of the metadata in ELNs by an ontology creates even more room for digital analyses.

  • © Conventus Congressmanagement & Marketing GmbH