Back
  • Invited talk
  • WS1.001-invited

Putting your data into the void

Appointment

Date:
Time:
Talk time:
Discussion time:
Location / Stream:
chromium

Session

Data management

Topic

  • Workshop 1: Data management

Authors

Clemens Mangler (Vienna / AT), Jani Kotakoski (Vienna / AT)

Abstract

Abstract text (incl. figure legends and references)

With rapid advances in detector technology and computing resources, modern microscopes produce massive amounts of data. Additionally, advanced sample preparation techniques often involve various steps utilizing multiple devices each generating additional data. Therefore, data management should be considered a crucial part of today's research infrastructure. This lead to the publication of guiding principles for scientific data-management describing four fundamental cornerstones: Findability, Accessibility, Interoperability and Re-usability (FAIR) [1].

To deal with this, we created a system capable of handling most of the usual data management tasks in an automatic fashion without getting in the way. This is implemented by a file storage infrastructure automatically collecting data from various experimental devices including electron microscopes, optical microscopes, sample preparation devices and mechanical testing systems. Raw research data is considered immutable and is therefore stored read-only on a scalable distributed storage system [2]. The data can be accessed via a front-end server by a secure connection using asymmetric cryptography.

Metadata is handled via an electronic lab book system [3]. This system is accessible via standard web browsers and is used for documenting experimental steps and persons, devices and samples involved in carrying them out. The lab book system is also used for keeping a database of samples. Currently we are testing an automatic sample tracking workflow based on QR-codes and a distributed database populated from the lab book entries.

A complete research data management system also has to cope with data processing. To achieve a reusable and well-documented data processing and analysis workflow, we use GitLab [4] to track development of our analysis tools and manuscripts. The GitLab server also acts as an issue tracker for administration of all the involved hard- and software assets.

To setup a future-proof, independent system and to avoid possible vendor lock-in scenarios, care was taken to only use free and open-source software [5].

In its current state the system is used internally in our work-group. A workflow for publishing data for external use is planned via the Repository for Permanent Hosting, Archiving and Indexing of Digital Resources and Assets (PHAIDRA) at the University of Vienna [6].

References

[1] M. Wilkinson, M. Dumontier, I. Aalbersberg, et al., "The FAIR Guiding Principles for scientific data management and stewardship", Sci Data 3, 160018, 2016, doi:10.1038/sdata.2016.18
[2] S. A. Weil, S. A. Brandt, E. L. Miller, D. D. Long, C. Maltzahn, "Ceph: a scalable, high-performance distributed file system", In Proceedings of the 7th symposium on Operating systems design and implementation (OSDI '06). USENIX Association, USA, 307–320, 2006
[3] N. Carp, A. Minges, M. Piel, "eLabFTW: An open source laboratory notebook for research labs", J. Open Source Softw., 2(12), 146, 2017
[4] P. Choudhury, K.Crowston, L. Dahlander, et al., "GitLab: work where you want, when you want", J. Org. Design 9, 23, 2020, doi:10.1186/s41469-020-00087-8
[5] R. M. Stallman, "GNU's Bulletin, Volume 1 Number 1". https://gnu.org/bulletins/bull1.txt. p. 8. (accessed 2022-07-28)
[6] University of Vienna, https://phaidra.univie.ac.at/ (accessed 2022-07-28).

  • © Conventus Congressmanagement & Marketing GmbH