Tara Bartolec (Heidelberg / DE; Sydney / AU), Xabier Vázquez-Campos (Sydney / AU), Alexander Norman (Sydney / AU), Clement Luong (Sydney / AU), Marcus A. Johnson (Sydney / AU), Richard J. Payne (Sydney / AU), Marc Wilkins (Sydney / AU), Joel P. Mackay (Sydney / AU), Jason K.K. Low (Sydney / AU)
Recent advances in structural biology have expanded our ability to create experimental structures for proteins and complexes, but many proteins remain refractory to these approaches or have not yet been analysed. Machine-learning based structure modellers have enabled access to highly accurate protein structure predictions for entire proteomes. These modellers are trained on experimental structures available in the Protein Data Bank (PDB), which constitute a relatively small subset of proteins, with many structures solved using non-native conditions or sequences. Therefore, a critical question is whether predicted (and PDB) structures reflect the bona fide structures and complexes formed by proteins in their native environment. We investigate this question using a large-scale cross-linking mass spectrometry (XL-MS) resource generated for the human cell.
To generate a high density and high depth XL-MS dataset for human HEK293 cells, we utilised a multipronged approach. Briefly, we cross-linked four subcellular fractions (nucleus, endoplasmic reticulum, mitochondria, cytosol) using three different cross-linkers with orthogonal chemistries (DHSO, DSSO, DMTMM). Then, we enriched cross-linked peptides using offline size-exclusion chromatography followed by further fractionation by high pH reverse phase HPLC. Mass spectrometry was performed on concatenated fractions using hybrid-MS2-MS3, or MS/MS with EThcD or HCD, fragmentation strategies. Cross-linked peptides were identified using XlinkX 2.3 or pLink2, using stringent search parameters and post-hoc filtering to control the false discovery rate to <2% at the unique residue pair (URP) or PPI levels.
We identified 28,910 URPs representing 4,084 unique proteins and 2,110 unique putative PPIs. Subcellular fractionation before cross-linking resulted in significantly improved proteome coverage, whilst orthogonal reactivities (D/E-D/E, K-K and K-D/E) improved the density of cross-linking per protein, especially in intra-protein links. We demonstrate that our resource of URPs confirm and rediscover existing experimental structures, capturing proteoforms and complexes within their approximate subcellular niches and range of conformations. Remarkably, our intra-molecular URPs also largely corroborate thousands of new structures predicted by next-generation modeller AlphaFold2, including those involving proteins (or regions of proteins) without existing resolution, and those lacking any structural precedent. Furthermore, our inter-protein crosslinks recapture the topology of well-described complexes and PPIs, whilst supporting or discovering poorly characterised PPIs. Finally, the inter-protein crosslinks also help localise PPI interfaces, and we use this information to assess quaternary protein structures modelled in AlphaFold-Multimer.