Cecilia Jensen (Freising / DE), Amirhossein Sakhteman (Freising / DE), Firas Hamood (Freising / DE), Julia Woortman (Freising / DE), Annika Schneider (Freising / DE), Bernhard Küster (Freising / DE), Matthew The (Freising / DE)
In precision oncology, DNA and RNA sequencing are increasingly applied to profile rare and difficult tumors. However, the therapeutically relevant (phospho)proteome is rarely included. This can, in part, be attributed to the need for deep, reproducible profiling along with the absence of comprehensive data analysis software that manages the entire processing from raw data to patient-specific oncogenic activity.
We implemented an end-to-end clinical proteomics workflow into two existing molecular tumor board programs and have been profiling >1400 rare and difficult cancer patient samples for >3 years. Here, we present the automated analysis pipeline of (phospho)proteomic data, enabling therapeutic recommendations within a turnaround time of two weeks.
Methods
16 patients (2 TMT-11 batches) are processed each week, resulting in expression data for ~8,000 proteins and ~30,000 phospho-sites per patient. Raw data is searched per batch and then combined into a single data matrix using several in-house developed tools that have been optimized to handle the large volumes of data (>4TB of raw data). This is followed by tools that enhance data completeness and reduce batch effects.
Single patient proteomes are then analyzed at protein abundance and phospho activity level using z-scores calculated across patients to find aberrant activity. This information is also integrated into a kinase substrate and protein phosphorylation signaling scoring strategy. Moreover, abundances and scores are combined into TUmor Proteome ACtivity scores (TUPAC) for 24 receptor tyrosine kinases that typically are druggable targets.
Preliminary data
Our activity scoring relies on relative protein and phosphopeptide quantification compared to a pan cancer background cohort to eliminate the need of healthy controls for comparison. We are continuously expanding our patient cohort for increased statistical significance and broader coverage of rare cancer types and different oncogenic mechanisms. Currently, the cohort consists of >1400 patient samples from various entities covering >100 different tumor types from >20 different tissues. >1000 prospective samples have been discussed in molecular tumor boards.
To evaluate each patient"s proteome efficiently, an interactive web portal containing the cohort data was created which also generates patient-specific reports in Excel format. These contain information on the level of phosphopeptide, protein, kinase activity and our own TUPAC scores. Quantification measures are given as rank, fold change and z-scores, each computed relative to the background cohort. The portal further includes tools for batch effect assessment and quality control. We investigated the effects of our normalization procedures and observed a substantial reduction of batch effects. The outcome of our software has shown to be clinically useful, easily interpretable through the portal, representing relevant biology and to be both robust and reproducible.