ISSN: 0974-276X
Editorial - (2012) Volume 5, Issue 10
Following the completion of the Human Genome Project, came the humbling discovery that humans possess barely more protein-coding genes than the roundworm. Since then, humans’ greater complexity has been substantiated by finely tuned regulatory and expression mechanisms, alternative splicing and post-translational modifications, and a dense protein interaction network resulting in a highly dynamic system. Indeed, the recent publication of the Encyclopedia of DNA Elements (ENCODE), which provides functional annotation for around 80% of the human genome, confirmed the intricate nature of gene expression control by revealing nearly 500,000 ‘promoter’ and ‘enhancer’ regions [1]. In addition, studies of post-transcriptional alterations led to size estimates of the human proteome much higher than the number of genes, usually within the 100,000-1,000,000 protein range. Whatever the actual value, the number of possible proteinprotein interactions is enormous.
To identify those interactions, high-throughput methods are required. Proteomics approaches, such as yeast two-hybrid screens (Y2H), combination of affinity purification and mass spectrometry (AP/MS) and protein microarrays, have been extensively applied to decipher the human protein interactome [2]. Despite those efforts, this network is still largely unknown. Therefore, bioinformatics has become the only practical way of revealing its full extend. Taking advantage of known interactions, computational methods are able to learn associated patterns and predict new interactions [3]. Moreover, as experimental techniques produce a substantial amount of false positives, software tools are also required to assess the validity of proposed interactions [4].
Protein interaction data are extremely valuable for biomedical research and drug design. However, knowledge of the existence of a given interaction is not sufficient to understand how a function is performed; the atomic structure of the protein complex is necessary. Low-throughput proteomics techniques, mainly X-ray crystallography, nuclear magnetic resonance and, to some extent, cryo-electron microscopy, provide detailed descriptions of those interactions. However, due to their high cost and technical limitations, there is no prospect of them resolving the human structural interactome in the foreseeing future. On the other hand, the amount of structural data currently available in the Protein Data Bank - above 80,000 entries 55 percent of which are complexes - may be sufficient to train bioinformatics tools aiming at predicting a significant portion of the human structural interactome.
Traditional docking approaches explore the space of configurations that proteins, involved in a complex, could adopt using energy based cost functions. Since those empirical functions are not optimal, docking tends to produce a set of possible models which need to be further evaluated [5]. Alternatively, additional constraints can be integrated within docking software to reduce the size of the search space [6]. Among them, locations of binding interfaces, hot spots or binding residues have proved particularly informative. In addition to experimental mutagenesis studies, structure based bioinformatics approaches can provide such data. They can be divided into two categories: analysis of local surface patches according to their chemical and geometric properties [7] or usage of templates based on either local or global structural similarity [8,9], or homologous protein structures [10].
With the weekly release of hundreds of novel high resolution protein and complex structures, and continuous improvements in docking and protein structure prediction, the production of human structural protein-protein networks on a large scale is becoming a reality. Recently, essential elements of the MAP kinase cascades - the signalling pathway directing cellular responses to potentially harmful, abiotic stress stimuli - have been released [11]. A database (PrePPI) containing high confidence predictions of more than 300,000 structures of human protein complexes is now available [12]. Moreover, partial construction of the human structural interactome has already generated medically relevant observations [13]. Following the mapping of disease associated mutations on a structural network consisting of literature-curated binary protein-protein interactions, Wang et al. discovered that a significant number of those mutations is localised on protein interfaces.
Those very encouraging results should not mask the many remaining challenges such as experimental resolution of membrane protein structures, template-free protein structure prediction, docking involving large conformation changes and prediction of weak or transient interactions. However, while more than 10 years elapsed between release of the human genome and its near complete annotation, one can be confident that the next major milestone, i.e. the determination of the human structural interactome, will be reached by the end of the decade.