Philipp T. Kaulich (Kiel / DE), Kyowon Jeong (Kiel / DE; Tuebingen / DE), Oliver Kohlbacher (Tuebingen / DE), Andreas Tholey (Kiel / DE)
Top-down proteomics (TDP) analyzes intact proteoforms, i.e., all molecular forms in which proteins can be present. Despite significant advances that have been made in the past, the proteome-wide identification of proteoforms faces several challenges, e.g., limited sensitivity and complex spectra. A typical TDP sample preparation workflow for analyzing cellular proteoforms includes cell lysis, prefractionation of proteoforms smaller than approximately 30 kDa, optional reduction/alkylation, and multidimensional fractionation approaches. In the literature, a plethora of sample preparation strategies have been presented. Here, we systematically examined the influence of the various sample preparation steps on identifying proteoforms and proteins by TDP from human Caco-2 cells. Besides the number of identifications, their physicochemical properties and the occurrence of artificially introduced modifications were evaluated.
Each step in sample preparation influenced the number, confidence, and physicochemical properties (mass, isoelectric point, GRAVY score) of the identifications. Lysis conditions using acidic pH tended to extract a higher number of proteoforms with an alkaline pI and vice versa. Moreover, the lysis affected the occurrence of artificially modified proteoforms (covalent modifications/adducts and artificial truncation). Reduction and alkylation increased the number and residue cleavage of proteoform identifications, with the effect on cysteine-containing proteoforms being the most significant. However, disulfide reduction resulted in the identification of artificially introduced proteoforms due to hydrolysis of peptide bonds C-terminal to aspartate residues, likely due to elevated temperature during the reduction. Moreover, information about disulfide bridges is lost. Prefractionation of proteoforms smaller than 30-50 kDa could significantly increase the number of identifications compared to full lysate analyses. However, different biases for subgroups of proteoforms (e.g., small, acidic, and hydrophilic) were observed depending on the sample preparation strategies. Specific steps in sample preparation led to the generation of artificial proteoforms, e.g., the use of formic acid and β-mercaptoethanol resulted in covalent adducts. Elevated temperatures can result in hydrolysis of peptide bonds C-terminal to aspartate residues. Multidimensional fractionation schemes significantly increased the number and confidence of proteoform identifications but on the cost of the measurement time. Furthermore, the longer sample handling time increases the risk of artificially introduced modifications and hydrolysis events. In summary, this study provides a comprehensive overview of the influence of different sample preparation steps in TDP on proteoform and protein identifications. For in-depth characterization of cells/organisms, performing multiple complementary sample preparation methods provides a straightforward approach to increase the number of identifications. Each variation in the various sample preparation steps has distinct advantages and limitations, and the specific research objectives should guide the selection of sample preparation steps based on the presented results.