.css-1xsl8rf{width:100%;display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-webkit-align-items:center;-webkit-box-align:center;-ms-flex-align:center;align-items:center;position:relative;overflow:hidden;background:var(--alert-bg);-webkit-padding-start:var(--chakra-space-4);padding-inline-start:var(--chakra-space-4);-webkit-padding-end:var(--chakra-space-4);padding-inline-end:var(--chakra-space-4);padding-top:var(--chakra-space-1);padding-bottom:var(--chakra-space-1);--alert-fg:var(--chakra-colors-orange-600);--alert-bg:var(--chakra-colors-orange-100);font-weight:var(--chakra-fontWeights-semibold);padding-left:var(--chakra-space-4);}.chakra-ui-dark .css-1xsl8rf:not([data-theme]),[data-theme=dark] .css-1xsl8rf:not([data-theme]),.css-1xsl8rf[data-theme=dark]{--alert-fg:var(--chakra-colors-orange-200);--alert-bg:rgba(251, 211, 141, 0.16);}@media screen and (min-width: 48em){.css-1xsl8rf{padding-left:var(--chakra-space-8);}}Bitte aktivieren Sie Javascript um alle Funktionen nutzen zu können und ihre Nutzererfahrung zu verbessern.

Poster presentation
P-III-0852

The sky is the limit: a cloud-based proteomics platform for the masses

Termin

Datum: Mi., 23.10.

Zeit: 13:00 – 13:00

Redezeit: 0 Min.

Diskussionszeit: 0 Min.

Ort / Stream:

Data Integration: With Bioinformatics to Biological Knowledge

Poster

The sky is the limit: a cloud-based proteomics platform for the masses

Session

Data Integration: With Bioinformatics to Biological Knowledge

Thema

Data Integration: With Bioinformatics to Biological Knowledge

Mitwirkende

Daniel Zolg (Garching / DE), Markus Schneider (Garching / DE), Patroklos Samaras (Garching / DE), Samia Ben Fredj (Garching / DE), Florian Seefried (Garching / DE), Dulguun Bold (Garching / DE), Layla Eljagh (Garching / DE), Tobias Schmidt (Garching / DE), Siegfried Gessulat (Garching / DE), Martin Frejno (Garching / DE)

Abstract

Background: Laboratories dealing with bottom-up proteomics data often encounter computational hurdles in the journey from raw data to conclusive insights. Challenges arise from the absence of automated pipelines and disjointed local infrastructure for data storage, processing, systematic result management, and interpretation. The advent of fast-scanning instruments exacerbates these issues by overwhelming local infrastructure with a multitude of files and large raw data sizes. Here, we introduce a highly scalable, fully automatable, cloud-based proteomics platform designed to streamline the entire workflow.

Methods: Our cloud-native platform comprises microservices operated on AWS™ and orchestrated by Kubernetes. Users can access the platform through either a command line client or a browser-based interface, both interacting with a API governing all platform functionalities. Raw data undergoes processing using Chimerys 4 on an elastic compute cluster. Results are stored in a data lake and can be explored directly in the browser or downloaded. Metadata annotation facilitates navigation and contextualization of numerous files. Platform access is available through subscription or self-hosted deployment.

Results: We present a comprehensive, managed solution for proteomics data management, obviating the need for user-managed pipelines and infrastructure. The platform offers an intuitive web interface for collaborative data upload, management, and processing. File transfer occurs at speeds up to 100 MB/s into scalable object storage. Raw data can be annotated with metadata via a searchable tag system, simplifying organization and retrieval. A scalable compute cluster enables simultaneous processing of DDA, DIA, and PRM data from thousands of files. The platform is algorithm-independent, currently supporting Chimerys 4.0 with plans for additional search engines. We demonstrate the scalability by processing 1k files without significant increase in processing time compared to single file processing. Processed data can be organized using the same tag system employed for raw data, with the processing overview providing immediate insight into key parameters for data quality assessment. A fast post-processing workflow combines individually searched raw files, facilitating longitudinal data acquisition and processing without overheads. Results can be accessed via API- or browser-based download, direct API access to the result data lake, browser-based data exploration, or a customizable visualization dashboard featuring common data analyses and visualizations.In conclusion, this managed, automated proteomics data pipeline promises to streamline the journey from raw data to insights, particularly benefiting laboratories lacking the resources to develop and maintain in-house solutions.