The protein origins of biological complexity
After his Ph.D. research under Peter Pauling at University College London, Cyrus came to the LMB for three years, 1970 to 1973. He then had a grand tour: three months with Fred Richards at Yale; six months with Michael Levitt at the Weizmann Intitute and two years with Joel Janin at the Institut Pasteur in Paris. He and Michael developed the "all-a, all-b, a/b and a+b" classification of protein structures. With Joel, he determined the principles that underlie protein-protein recognition and produced models that explain how secondary structures pack in proteins.
In 1976 he returned to England and for next fourteen years was attached to University College London and the LMB. Between 1980 and 1990 he was the E.P.A. Cephalosporin Fund Senior Research Fellow of the Royal Society. He and Arthur Lesk showed that proteins adapt to mutations by changes in structure; described the mechanisms that transmit information between distant sites in proteins; and showed that there is a small repertoire of main chain conformations for immunoglobulin hypervariable regions and that those present in an antibody can be predicted from their sequence.
In 1992 he proposed that most proteins are built of domains that come from a small number of families. He collaborated with Alexey Murzin, Steven Brenner and Tim Hubbard to create the SCOP database, a periodic table for all known protein structures, and with Julian Gough to create the SUPERFAMILY database which uses hidden Markov models to identify protein sequences that are related to those with known structures.
During the course of evolution the complexity of organisms, as measured by the number of their cells and the number of different cell types, has increased greatly. The anatomy and physiology of organisms are largely determined by the proteins encoded in their genes and by the regulation of the expression of these genes. Changes in complexity have involved changes in protein repertoires and/or changes in expression. Expansions of protein repertoire are largely produced by the processes of gene duplication, sequence divergence and gene combination.
The genome projects have provided complete, or almost complete, descriptions of protein repertoires. Our recent work on eukaryotes shows that number of members in 200 superfamilies, of the 1,200 that were known, correlates with the complexity of the organisms in which they occur.
The increases in these superfamilies play the major role in increasing biological complexity. We are now trying to discover when the expansions of these superfamilies occurred and how they contribute to the formation of complex organisms.
There are also other projects that are concerned with the evolution, structure and dynamics and function of proteins.
- Vogel, C. and Chothia, C. (2006)
Protein family expansions and biological complexity.
PLoS Comput Biol 2: e48
- Chothia, C., Gouch, J., Vogel, C. and Teichmann, S.A. (2003)
Evolution of the Protein repertoire.
Science 300: 1701-1703
- Vogel, C., Teichmann, S.A. and Chothia, C. (2003)
The immunoglobulin superfamily in Drosophila melanogaster and Caenorhabditis elegans and the evolution of complexity.
Development 130: 6317-6328