The Unknome database classifies proteins based on how little is known about them, encouraging research on understudied proteins
Our DNA contains instructions for around 20,000 proteins that make up our cells and tissues. Despite decades of work, the role of thousands of human proteins remains unclear despite the likelihood that they participate in possibly new areas of biological function. This is because research tends to focus on proteins that are already well understood.
To help address this, Sean Munro’s group in the LMB’s Cell Biology Division have created the Unknome database, which ranks proteins based on how little is known about them. Together with Matthew Freeman — former LMB Group Leader now Head of the Sir William Dunn School of Pathology, University of Oxford — they performed functional screens on a subset of proteins in the database and revealed that a majority contribute to important cellular functions, including development and resilience to stress.
The Unknome database was built by Tim Stevens, a Senior Investigator Scientist working with Sean, and the statistical analysis was performed by Rajen Shah from the University of Cambridge. They started with a list of all ~20,000 human proteins and collected all the information that is available about their function, or the function of the closely related proteins from model organisms like mice, flies, or yeast. They then assigned each protein a “knownness” score depending on the quantity of available knowledge. The resulting database is publicly available and customisable.
To assess the value of the database, João Rocha, Satish Arcot Jayaram and Nadine Muschalik used it to select 260 genes in humans for which there are comparable genes in fruit flies but almost nothing is known about their function. They used RNA interference to remove the corresponding proteins from fruit flies. They found that over a quarter are essential for flies to live. Further screens showed that a large fraction of the remaining proteins contribute to important functions including fertility, development, tissue growth, protein quality control, or stress resistance.
Overall, their approach demonstrates that significant and unexplored biology is encoded in the neglected parts of proteomes. The Unknome database provides a means to focus on these proteins and a valuable resource for guiding biological studies.
This work was funded by UKRI MRC, the Engineering and Physical Sciences Research Council (EPSRC) and the Alan Turing Institute.
Further references
Functional unknomics: systematic screening of conserved genes of unknown function. Rocha, JJ., Arcot Jayaram, S., Stevens, TJ., Muschalik, N., Shah, RD., Emran, S., Robles, C., Freeman, M., Munro, S. PLOS Biology.
Sean’s group page
Matthew Freeman – Sir William Dunn School of Pathology
Unknome database
Previous Insight on Research articles
Tail of SARS-CoV-2 Spike protein is optimised to reach the cell surface causing infection to spread to neighbouring cells
The missing link between golgins and endosomal vesicles discovered