Eric Deutsch (Seattle, WA / US), Charles Pineau (Rennes / FR), Cecilia Lindskog (Uppsala / SE), Christopher M. Overall (Vancouver / CA), Sandra Orchard (Cambridge / GB), Nuno Bandeira (La Jolla, CA / US), Robert L. Moritz (Seattle, WA / US), Gilbert S. Omenn (Ann Arbor, MI / US)
As the HPP draws near to completion of its first goal of achieving confident detections of all entries in the human proteome parts list, the HPP has embarked on a more audacious Grand Challenge Project of determining at least one molecular function of each protein. There will, of course, always be more to learn about every protein, but the HPP sets as its goal that we achieve a solid understanding of at least one function of each protein at the molecular level, ideally via at least two orthogonal methods. In order to track the progress in this goal, a metric must be developed since there is currently no suitable metric with which to measure the distance from our goal.
The PE score, developed by Swiss-Prot decades ago, has served the HPP well in measuring its progress in its first goal, detection of each protein and information about where it is found. We have therefore developed an analogous FE ("function evidence") score. As PE1 is the ultimate goal for detection, FE1 is the ultimate goal for each protein for function, denoting a good understanding of its molecular function, which, as noted above, is not to imply that nothing more can be learned about an FE1 protein. FE5 represents the lowest category, meaning that essentially nothing is known.
In practice, to be a useful metric the FE score must be computable from a repository of protein function information, and UniProtKB is currently the most complete repository of protein functions. All functional information in UniProtKB can be downloaded as an XML file, parsed, and used for calculation. The FE working group has defined a set of heuristics for computing the FE score that are precisely based on information available in UniProtKB. This information includes a free-text functional description and associated publications, Gene Ontology terms, and EC numbers. Furthermore, all annotations have an "ECO code", which is a term from the Evidence and Confidence Ontology describing how each piece of information is known. The best ECO codes describe annotations that are manual assertions based on direct experimental evidence. These stem from Swiss-Prot curators reading papers in the literature that describe experiments that provide functional evidence. At the lowest level are automatic assertions based on computational results not individually validated by curators.
Here we present a draft of the FE score, how it is computed, and the progress of the Human Proteome Project in pursuit of achieving a solid understanding of at least one function of each protein in the human proteome at the molecular level.