Haichao Zhou (Shenzhen / CN), Linyuan Fan (Shenzhen / CN), Guixue Hou (Shenzhen / CN), Siqi Liu (Shenzhen / CN)
Spatial omics is a frontier that provides plentiful information regarding gene expression and functions with characteristics of spatial distribution, of which the technology and bioinformatics related to spatial transcriptomics have reached a maturation level, whereas that in spatial proteomics is still at an infant phase. As spatial proteome sampling from tissues and identifying as well as quantifying on peptides are so different from spatial transcriptome, all the procedures of data analysis are reasoned a proteome specific mode. Besides, the ideas on mining the spatial information of transcriptomes and the datasets of single cell transcriptomes should be used as references to develop informatic tool of spatial proteomics. Herein, we have elaborated a workflow of data analysis towards spatial proteomics with R package, termed as SPverse.
SPverse is consisted of four layers for data analysis. To gain proteomic data with high quality, the data with poor quality was removed according to protein identification numbers and density distribution then the filtrated data was globally normalized. As proteome differences due to individual diversity interfere sample clustering in entire proteomes of a cohort, Harmony was taken to attenuate these interferences. After such debatching, the treated data was iteratively implemented by consensus clustering until a stable result. To find out the proteomic characteristics in specific regions, the typically representative proteins were extracted through region comparison guided by HE-staining images, and then the index of protein features in a special region was assessed by GSVA. The scores of protein features were mapped onto the tissue slides to construct a spatial proteome image. To compare of spatial feature changes responding to a stress, the differential expression protein (DEP) related to the stress was first derived from Volcano analysis, and the DEPs were put into GSVA to make an index of protein features that functions a scoring system towards spatial distribution of stress-related proteomes.
In the proteome database of FFPE micro-tissues, a total of 1144 proteomic data was collected, including 144 from specifically and 1000 from continuously picking up. During pre-treatment to the raw data, 138 from specific spots and 971 from continuous spots were remained. Of these specific spots, 278 feature proteins of tumor and 139 that of stroma were defined, while according to the feature proteins, the tumor scores were generated by GSVA. Based on the proteomic information, these continuous spots were broadly divided into 6 groups by consensus clustering, while upon the tumor scores, the spatial distribution of scores on tissue was delineated. Impressively, the proteomic clusters were spatially overlapped with the distribution of tumor scores. Furthermore, the proteins in the tumor or stroma tissues from the patients who were sensitive or insensitive to chemo-therapy were explored using Wilcoxon and T-test.