Tine Claeys (Ghent / BE), Juan Vizcaino (Cambridge / GB), Kris Gevaert (Ghent / BE), Lennart Martens (Ghent / BE)
Tissue-specific characteristics are influenced by numerous factors, with proteins playing a pivotal role. Alterations in PTMs can modify the tissue-specific functionality of proteins, but this has been largely explored in a restricted set of proteins and PTMs. New open modification search engines have expanded the possibilities for studying PTMs from a more unbiased perspective. In previous research(1), we trained a machine-learning model on public datasets of healthy human tissue, revealing tissue-specific protein patterns. Applying this model to diseased data demonstrated its effectiveness in biomarker detection, aided by explainable AI highlighting the proteins contributing to classification.
To evaluate the tissue-predictive capability of PTMs in a similar manner, we reprocessed data from the human draft map of the proteome (2) with the ionbot open modification search engine. This data was utilized in three ways. First, a model was trained on only the protein information, achieving a high F1-score of 71%. Secondly, we added the PTM information to the protein data, resulting in a more nuanced view and a 4% improvement of the F1-score. Lastly, only the PTM information was used for training which resulted in a surprising F1-score of 60%. To further test the PTM classification capabilities, dataset (3) was used for evaluation. This resulted in an accurate classification of 65% of tissues, confirming the hypothesis that tissues exhibit specific PTM patterns. Explainable AI methods revealed the contribution of PTMs to each tissue, showing strong PTM patterns in metabolically active tissues like the liver, brain, T-cells, and monocytes, driven notably by glycosylation patterns.
This observation was further confirmed when studying the tissue-specificity of proteoforms of widespread proteins using the same (2) dataset. We selected proteoforms from proteins identified in 75% of tissues, such as actin, filamin, glyceraldehyde-3-phosphate dehydrogenase, etc. Using this dataset, containing a minimal amount of information with 384 proteoforms from 26 proteins, the model achieved an F1-score of 38% across 16 tissues, effectively classifying highly metabolic tissues but performing poor on other tissues.
In conclusion, our research highlights the potential of combining open modification searches with machine learning and explainable AI to study tissue-specific PTMs and proteoforms. This approach not only enhances our understanding of tissue-specific proteomes but also opens new avenues for biomarker discovery and disease characterization. The identification of distinct PTM patterns in metabolically active tissues underscores the importance of PTMs, providing a clear direction for future investigations into their specific roles.
Claeys, T., et. al. J Proteome Res (2023).Kim, M.-S. et al. Nature (2014).Wang, D. et al. Mol Syst Biol. (2019)