Maik Pietzner (Berlin / DE), Carl Beuchel (Berlin / DE), Summaira Yasmeen (Berlin / DE), Burulça Uluvar (Berlin / DE), Martijn Zoodsma (Berlin / DE), Mine Köprülü (Cambridge / GB), Julia Carrasco-Zanini (London / GB), Claudia Langenberg (Berlin / DE; Cambridge / GB; London / GB)
Affinity-based proteomic techniques now measure thousands of proteins circulating in blood at population scale, but only ~10% of which have established roles in blood leaving the sources and relevance of variation for most proteins poorly understood. Here, we used machine learning to systematically identify and quantify major predictors of plasma levels of ~3,000 protein targets among 43,240 participants of the UK Biobank. We identify 427 diverse, modifiable and non-modifiable, factors that cumulatively explained on average 23.7% (range: 0.001%-79.7%) of variation in plasma levels of 2,846 protein targets (97.4% of all protein targets), selected from over 1,800 participant and technical characteristics. We demonstrate that protein targets segregate into 11 clusters, of which all but one, are predominantly explained by a single predictor of plasma protein variances, including platelet count (n=635), germline genetic variation (n=372), renal (n=199) and liver function (n=22), low-grade inflammation (n=208), ancestry (n=150), and sex (n=10). With few exceptions (2.7%), we obtain largely similar explained variances in plasma protein levels across the sexes and ancestries (British Central South Asians and Africans), but with evidence that selected participant characteristics differed by sex or ancestry. Our results highlight new tissue and cell-type disease links, as well as potentially (adverse) effects of medications. For example, intake of warfarin or clopidogrel explains variance in plasma levels of proteins specifically expressed in fibroblasts. In summary, we provide a data-driven approach to better understand the factors that shape the human plasma proteome and help guide blood-based protein biomarker discovery.