Back
  • Poster presentation
  • P-III-0820

OutCyte UPS 2.0: a representation learning based new tool for leaderless protein secretion prediction

Appointment

Date:
Time:
Talk time:
Discussion time:
Location / Stream:
Data Integration: With Bioinformatics to Biological Knowledge

Poster

OutCyte UPS 2.0: a representation learning based new tool for leaderless protein secretion prediction

Topic

  • Data Integration: With Bioinformatics to Biological Knowledge

Authors

Jan van Grimbergen (Duesseldorf / DE), Sara Schulte (Duesseldorf / DE), Gunnar Klau (Duesseldorf / DE), Kai Stühler (Duesseldorf / DE), Gereon Poschmann (Duesseldorf / DE)

Abstract

To communicate with their environment, cells developed different strategies to release proteins into the extracellular space. Classical protein secretion via the endoplasmic reticulum / Golgi route is based on N-terminal signal peptides. While prediction of signal peptides and classically secreted proteins has been well established by several groups, the prediction of leaderless secreted proteins is still challenging because a clear sequence motif is lacking and the number of proven leaderless secreted proteins is limited. One available tool to overcome these challenges is OutCyte, which has previously been developed by our group. OutCyte is a feature-based deep-learning method where the features are calculated from the protein sequence. To overcome the limited number of available leaderless secreted proteins, OutCyte used experimental secretome data for training.

Here, we present OutCyte UPS 2.0, a new version of OutCyte with two major improvements: First, we extended the source of experimentally proven secreted proteins from 10 to 53 different cell types. Second, instead of using a feature-based model in which relevant features might not be included, we developed an artificial neural network based on a representation learning approach for which no features are necessary. For this, we used representations from the transformer protein language model ESM2 (evolutional scale model).

Using 5-fold cross-validation, we observed an F1 score of 0.64 for the previous OutCyte version, which was surpassed by OutCyte UPS 2.0 with a score of 0.83. Out of 18 proteins, described as leaderless secreted in the literature, our model is able to predict 15 correctly. OutCyte UPS 2.0 is publicly available at www.outcyte.com.

    • v1.20.0
    • © Conventus Congressmanagement & Marketing GmbH
    • Imprint
    • Privacy