Skip Navigation



Briefings in Bioinformatics Advance Access published online on July 16, 2007

Briefings in Bioinformatics, doi:10.1093/bib/bbm031
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
8/5/333    most recent
bbm031v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Kann, M. G.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Kann, M. G.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Published by Oxford University Press 2007.

Protein interactions and disease: computational approaches to uncover the etiology of diseases

Maricel G. Kann

Corresponding author. Maricel G. Kann, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD 20894, USA. Tel: 301-402-3010; Fax: 301 480 4637; Email: kann{at}mail.nih.gov


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 CONTRIBUTIONS OF STRUCTURAL...
 DISEASES AND PROTEIN INTERACTION...
 CONCLUSIONS AND FINAL REMARKS
 Funding
 FOOTNOTES
 Acknowledgments
 References
 
The genomic era has been characterised by vast amounts of data from diverse sources, creating a need for new tools to extract biologically meaningful information. Bioinformatics is, for the most part, responding to that need. The sparseness of the genomic data associated with diseases, however, creates a new challenge. Understanding the complex interplay between genes and proteins requires integration of data from a wide variety of sources, i.e. gene expression, genetic linkage, protein interaction, and protein structure among others. Thus, computational tools have become critical for the integration, representation and visualization of heterogeneous biomedical data. Furthermore, several bioinformatics methods have been developed to formulate predictions about the functional role of genes and proteins, including their role in diseases. After an introduction to the complex interplay between proteins and genetic diseases, this review explores recent approaches to the understanding of the mechanisms of disease at the molecular level. Finally, because most known mechanisms leading to disease involve some form of protein interaction, this review focuses on the recent methodologies for understanding diseases through their underlying protein interactions. Recent contributions from genetics, protein structure and protein interaction network analyses to the understanding of diseases are discussed here.

Keywords: computational biology, diseases, genes, phenotype, proteins, protein interactions


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 CONTRIBUTIONS OF STRUCTURAL...
 DISEASES AND PROTEIN INTERACTION...
 CONCLUSIONS AND FINAL REMARKS
 Funding
 FOOTNOTES
 Acknowledgments
 References
 
‘Life is a relationship between molecules, not a property of any one molecule. So is therefore disease, which endangers life’, wrote Zuckerkandl and Pauling (1962) in their chapter on ‘Molecular disease, evolution and genic heterogeneity’ [1]. Over 40 years later, we are still far from unraveling the molecular mechanisms of most diseases and pondering about the role of molecular interactions on healthy and diseased organisms. Indeed, proteins do not function in isolation but rather within the cell, interacting mostly with other proteins but also with other molecules such as DNA, RNA and small molecules. Thus, studies of proteins and their interactions are essential to understand their role within the cell. Here, the term ‘protein interaction’ includes a great range of events, such as transient and stable complexes, as well as physical and functional interactions.

The focus of this review is protein interactions and their role in understanding diseases. The main topic is divided into three fields of research, addressed in three sections. The first section reviews the association of genes or proteins and their interacting partners with a particular disease. The second addresses the structural analysis of disease-related proteins, protein complexes and their mutants. The third section covers the analysis of the global properties of the protein interaction networks, i.e. those related to diseases. The methods and hypotheses presented here were formulated for general application to any kind of disease. When it aids to illustrate an idea or application, a particular disease is singled out and a brief description of the disease is provided. Readers are referred to the cited work for more details.

This review offers a computational perspective on a broad emerging field that considers the role of protein interactions in the etiology of diseases and the generation of new hypothesis derived from this knowledge.

Proteins and genetic diseases
This section provides a brief introduction to phenotype–genotype association studies and an overview of recent computational methodologies that prioritise disease-related genes. Gene-phenotype association and protein interaction studies are intimately related. Uncovering the mechanism by which genes (and their interactions) cause disease reveals information about the interplay between their corresponding protein products. Conversely, protein interaction studies play a major role in the prediction of new gene-phenotype relationships.

From genes to phenotype
Progress in genetic studies towards the association of phenotype with genotype has led to the identification of an increasing number of genes that contribute to human disease. Mendelian traits or diseases, named after Gregor Mendel, are those inherited and controlled by a single gene. This gene can be isolated based on its position in the chromosome by a process known as positional cloning [2]. Some examples of human disease-related genes which were identified by positional cloning are the genes associated with cystic fibrosis [3, 4], Huntington disease (HD) [5] and breast cancer susceptibility [6, 7]. The first step of positional cloning is linkage analysis, in which the gene is mapped using a group of DNA polymorphisms from families that segregate the disease phenotype [8]. Once the gene that predisposes a disease is identified, its protein products and mutations can be studied to clarify the nature of the disease process. Even in simple Mendelian diseases, however, the correlation between the mutations in the genome of the patient and the symptoms might not be clear [9]. Several reasons have been suggested for this apparent lack of correlation between genotype and phenotype, as illustrated in Figure 1 [10–12]. Among them are pleiotropy (the ability of some genes to produce multiple phenotypes), environmental factors and the influence of other genes. Genes could influence each other in several ways: they can interact synergistically, one could mask the phenotypic effect of the other (phenomenon known as epistasis), or a gene could modify another gene (having a small quantitative effect on the expression of the other gene). For instance, cystic fibrosis and Becker muscular dystrophy, previously considered classical examples of Mendelian pattern of inheritance, are believed to be caused by a mutation of one gene modified by other genes [13, 14]. These observations lead to the evolving concept of oligogenic diseases, which require the interaction of a few genes, presenting inheritance patterns somewhere between monogenic and polygenic (reviewed in [15, 16]). This and other studies have demonstrated that even simple Mendelian diseases can lead to complex genotype-phenotype associations [12].


Figure 1
View larger version (29K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 1: From Mendelian to complex diseases: (A) the mutation of a gene is the main cause for the phenotypic trait or disease. Gene pleiotropy, gene modifiers, and environmental factors all influence the final phenotype (see glossary of terms in Table 3). (B) Most Mendelian or single gene diseases are determined by mutations at a single locus, mostly by those that produce mutations in the coding region of the protein. Other factors discussed in (A) also affect Mendelian diseases. (C) Oligogenic diseases require the interaction of a few genes and exhibit inheritance patterns somewhere between monogenic and polygenic. In the illustration, two genes and factors associated (represented by the hexagonal shapes) interact to produce a digenic disease. (D) Complex diseases or traits are affected by a multitude of genes (represented as black ovals) and several factors, such as environment (represented by white rectangles). Genes (and/or environmental factors) can affect other genes by enhancing (black arrows) or inhibiting (gray arrows) their action (to simplify arrows were drawn only for two genes).

 
To add to the complexity, most common diseases such as cancer, metabolic, psychiatric and cardiovascular disorders (e.g. diabetes, schizophrenia and hypertension) are believed to be caused by several genes (multigenic) and affected by several factors including environmental ones (e.g. diet, infection by bacteria) [17]. Despite an increasing understanding of the multigenic inheritance, the study of these complex diseases remains challenging [18].

From genes to protein complexes and back
One of the main challenges scientists face today is deciphering the molecular details that lead to diseases. Even when the genetic basis of a disease is well understood, not much is known about the molecular mechanisms leading to the disorders. For oligogenic diseases, synergistic contribution of genes from several loci could explain disruptions in their products, in particular when these proteins are directly or indirectly interacting. Two models, namely the dosage [19] and the poison [20] model, have been used to explain the molecular mechanisms of the disruption (reviewed for oligogenic diseases in [16]).

The dosage model explains disruptions of two proteins within a complex. Mutations in one protein alone weaken the interaction but do not affect the phenotype. Only when the two proteins are mutated, the complex is not formed and the phenotype is affected. For instance, mutations that affect ligand-receptor interactions could be explained with such a model.

In the poison model, mutations in one of the proteins disrupt the complex but enough of the unchanged complexes are still available to maintain the function. Addition of another mutated subunit will further decrease the already reduced number of normal complexes, resulting in phenotype changes. The molecular models described earlier could be also used to explain indirect interactions between proteins (i.e. proteins that do not physically interact but participate in the same functional pathway).

The increasing knowledge about protein networks can be used towards identifying new genes and genetic mechanisms behind diseases. For instance, if the gene products (proteins) have any functional interaction, one could trace these proteins back to their respective genes and identify the genes responsible for the disease. Identifying genes associated with complex diseases from all possible candidates generated from genome-wide genetic linkage studies would involve searching through hundreds of genes. Several computational approaches to prioritise genes related to diseases have been developed to aid linkage analysis and association studies. Some of these methods rely on sequence and functional differences between disease-causing proteins and others not related to any disease. For example, sequence-based properties such as length, conservation across species and number of paralogs have been used to create disease classifiers [21–24].

Other methods tend to integrate several sources of data, like gene expression, gene ontology (GO) annotation, and disease phenotype annotation from the Medical Subject Headings (MeSH) and Online Mendelian Inheritance in Man (OMIM) [25–36]. The performance of seven of these computational methods [21, 23, 25, 30–32, 36] was recently reviewed by Tiffin et al. [37] and applied to the analysis of candidate genes for type2 diabetes (T2D) and obesity genes. The authors identified a set of primary candidates with nine T2D genes and five obesity genes (selected by six out of seven methods). They also generated a secondary set of 94 and 116 gene candidates for T2D and obesity, respectively (found by five out of seven methods), of which 58 of these genes were common to both diseases. This study reviews seven independent computational methods and illustrates how these methodologies can be used to identify genes related to complex diseases, followed by an interesting discussion on the integration of results from different methodologies.

Several of these methods explicitly incorporate protein interaction data. Oti et al. [38] specifically analysed the effect of incorporating protein–protein interaction into the prediction of disease-causing gene candidates. Their results suggest that the inclusion of interaction data will result in approximately a 10-fold improvement on gene candidate identification. These methods are limited by the quality and sparseness of the experimental protein interaction data. Advances in the experimental and computational approaches towards an accurate identification of the interactions will have an impact on the enhancement of these methods.

The problem of gene-phenotype association is complicated further by the fact that not only mutations in multiple genes could cause one disease but multiple syndromes could be caused by mutations in the same gene. These provide an explanation for the continuum of syndromes and their overlap (illustrated in Figure 2) [39]. For instance, the overlap of human malformation syndromes leads to the concept of ‘syndrome families’ [40]. It is possible that these diseases arise from the disruption of strongly related genes (i.e. in the same protein complex or pathway). For example, Fanconi anemia and Usher syndrome type 1 both represent diseases produced by mutations in several genes involved in the same complex [41, 42]. Sam et al. [43] have developed a method to compare diseases based on their shared network of protein interactions. Upon manual examination of significantly correlated disease pairs found by this method, the authors confirmed that, for several of them, the correlation was previously reported in literature. For instance, Cockayne syndrome and Xeroderma Pigmentosum are predicted to be correlated through their protein–protein interaction networks as expected from the literature [44].


Figure 2
View larger version (18K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 2: The interaction networks for two diseases are represented by two connected graphs (D1 and D2) and the overlapping Venn diagrams. The diseases share a set of related proteins, namely a ‘disease module’ (enclosed by a rectangle). The two hypothetical diseases would likely share a set of phenotypic features. The networks depicted are from an arbitrary set of proteins and were obtained using Cytoscape [137].

 
A recently published method, developed by Lage et al. [45] illustrates the use of knowledge regarding protein interactions to prioritise genes within a linkage interval (these intervals were obtained from genetic linkage analysis data extracted from OMIM and GeneCards databases). The patient phenotype associated with this interval databases, is compared to the phenotype of all disease-related proteins interacting with each of the candidate proteins (gene products). A Bayesian predictor based on the pairwise score between phenotypes (obtained using text mining techniques) is used to rank all gene candidates extracted from OMIM's 870 linkage intervals linked to disease. This method, which capitalises on the fact that interacting proteins are often responsible for similar phenotypes, produced several novel putative disease-causing genes. This approach, and related ones, are limited by the reliability of protein networks and the sparseness of protein-disease association data and would greatly benefit from a more accurate description of phenotypes (as described subsequently).

Another major challenge is the integration and organization of phenotypic databases. NIH recently acknowledged this need by launching the whole genome association studies. The NCBI's database, dbGAP [46] provides open and controlled access to summary and individual data, respectively for several genotype association studies. A recent review by Lussier et al. [47] points to the challenges faced by the emerging field of high-throughput approaches for studying genotype–phenotype relationships, namely the field of phenomics. Advances in this area will depend on robust taxonomies for phenotypes and on the accuracy of their clinical description [48, 49].


    CONTRIBUTIONS OF STRUCTURAL ANALYSIS TO THE UNDERSTANDING OF DISEASES
 TOP
 ABSTRACT
 INTRODUCTION
 CONTRIBUTIONS OF STRUCTURAL...
 DISEASES AND PROTEIN INTERACTION...
 CONCLUSIONS AND FINAL REMARKS
 Funding
 FOOTNOTES
 Acknowledgments
 References
 
In many cases, a clear understanding of the malfunction that ultimately causes one or several diseases can only be achieved when the molecular level of the protein interactions are known. The three dimensional structure of the protein interaction complex, whether available or modeled, can provide such detail. Furthermore, understanding the binding at such level is critical for the rational design of new therapeutic agents targeted to disrupt interactions that cause disorders. The following section examines the contribution of structural analysis to the understanding of diseases, and provides an overview of several disease-related resources.

Protein structure, protein complexes and disease
To complement classical structural biology, the structural genomic (SG) projects aim to solve the X-ray or NMR structures in a high-throughput manner. The SG initiative's goal is to provide three-dimensional structural models for all proteins encoded by complete genomes [50]. Most experimentally derived structures, however, might not be directly related to any human disease [51]. This requires computational homology studies to obtain models for the human or pathogens proteins relevant to diseases. For instance, from over 40 000 proteins with known structure deposited in the protein data bank (PDB) [52], only a few hundred are known to be related to diseases. Several computational approaches have been implemented to predict function from protein sequence and structure information (see reviews in [53, 54]). However, experimental techniques are still needed to validate the functions of these proteins.

Studies of inherited or somatic non-synonymous mutations constitute the main source for the analysis of the etiology of diseases at the molecular level. A distinction should be made among those rare mutations responsible for functional disruption that lead to disease and the large number of common variations in the human genome derived from high-throughput single nucleotide polymorphisms (SNPs) analysis experiments [55]. The majority of the non-synonymous SNPs (nsSNPs), in particular those that are present in a large number of individuals, are probably not associated with any disease. Rare variants (found in a very small percentage of the population), on the other hand, tend to occur on structurally and functionally relevant sites. This suggests that structural information can be valuable for understanding the effect of mutations and nsSNPs [56]. Several computational methods based on stability, evolutionary and structural information have been developed to predict the impact of a mutation on the protein function. Resources related to this methods are listed in Table 1 (see review by Mooney for more details [57]). The main drawback of these methods is their low accuracy, which has been shown to improve with the addition of structural information [58, 59]. Even if the disruption of the function is correctly predicted, none of these methods offer insight on how the mutation affects the function. Table 2 provides a list of several publicly available genomic databases containing disease information. This list is by no means exhaustive. For instance, for well-studied diseases such as cancer, there are several disease-specific resources available that might or might not be encoded in the data sources listed subsequently (e.g. GeneCards, described subsequently, includes information from CGAP, the National Institute of Cancer's Cancer Anatomy Project). The OMIM database [60], manually curated and updated daily, is one of the largest catalogs of human genes and disorders. As part of the NCBI Entrez database, OMIM is freely available and contains over 11 000 genes with known sequence and over 6000 phenotypes. It should be noted that only a few hundred of the genes with known sequences currently annotated in OMIM have known phenotypes. Automatic approaches for linking genotype with phenotype information have the potential to overcome the data scarcity problem inherent in manual efforts. To this effect, several approaches (such as PhenoGo [61] that use natural language processing in combination with GO data) have been developed to create a collection of over 500 000 phenotype-GO associations, including approximately 33 000 genes from 10 species. Similarly, Gene2Disease automatically assigns priorities to genes related to a disease, and provides a list of candidates based on PubMed MeSH terms and GO. Another resource, Genecards [62], provides a suite of tools that integrate information from over 70 sources including OMIM, constituting a single location to retrieve available information for over 24 000 genes including relationships to diseases when available. The PhenomicDB [63] database uses associated orthology relations to provide multi-species genotype–phenotype mappings across human and several model organisms. The Orthodisease database provides a cluster of more than 3000 disease genes comprising 26 Eukaryotic organisms. Swissprot is a database of protein sequences that includes disease annotations for about 2600 of its 270 000 entries (16 600 are for human proteins). Finally, PharmaGKB [64] is a catalog of over 300 genes and 400 diseases (with genes involved in drug response), providing a single platform to study relationships between drugs, diseases and genes. Users will find that most of these databases are freely available (Genecards is limited to non-profit institutions) and their interface varies in flexibility and convenience. Almost all of them can be easily searched using related words in the query (disease or gene). In addition, the use of standard vocabularies and ontologies within all these databases needs to expand beyond GO, so that descriptions of disease phenotypes, cytological changes, and molecular mechanisms can be well-defined and standardised for better discoverability, correlations and mining. In general, while these databases provide an excellent resource, only a small proportion of the genomic data known to be involved in an inherited disease have both known gene sequence and known phenotype. Despite the scarcity of the structural data related to disease, Moult and collaborators have shown that for a set of genes associated with monogenic diseases, the loss of protein stability is a major factor contributing to disease [65].


View this table:
[in this window]
[in a new window]

 
Table 1: Resources for SNP validation

 

View this table:
[in this window]
[in a new window]

 
Table 2: Databases with disease annotation

 
Structural data, when used in combination with information about mutations responsible for disease, could be essential in unveiling the molecular mechanisms leading to diseases. A study by Thornton and collaborators showed that the patterns of mutations of residues associated with human inherited diseases (from OMIM database) are different from that of the large number of nsSNPs (from dbSNP database) [66]. Their results showed that mutations that lead to major changes in hydrophobicity were more frequent in the OMIM data than in dbSNP. In addition, they found that disease-related mutations are more likely to be buried in the protein structure than what would be expected for the average protein residues. These results are consistent with several other studies, including Viktup et al. [67] and a more recent study by Ye et al. [68]. Conversely, common nsSNPs are less likely to be in protein cores than expected on average, a feature that could be useful when predicting the functional impact of an nsSNP (discussed in the previous subsection).

On the other hand, the protein core represents only a fraction of all residues. Consequently, many of the disease related mutations lie in solvent accessible sites, suggesting that the analyses of these mutations might also shed a light on the mechanisms of the disease. For instance, Thornton and collaborators estimated that more than half of the disease-related mutations analysed in their study occur at solvent accessible sites [66]. In their analysis, Ye et al. found that disease-related mutations located in the protein surface tend to be clustered, forming surface patches, while SNPs are uniformly distributed [68]. These could explain the role of mutations in disease since mutations in the binding site would likely disrupt the protein interaction and function. To reach this conclusion, the authors compared the location and distribution of disease-related mutations with nsSNPs on a set of protein domains obtained through homology modeling of disease-related proteins. The authors verified that, for a smaller subset of experimentally determined structures, disease-related mutations are located mostly on the binding interfaces of proteins.

Protein structural analysis has helped to elucidate the molecular basis of several diseases. For example, a protein interaction disruption in Von Hippel-Lindau syndrome (VHL), a common mutation from Tyrosine to Histidine at residue 98 (represented as Tyr98His), which is part of the binding site, disrupts the binding of the VHL protein to a protein called the hypoxia-inducible factor (HIF). As a result, the VHL protein no longer degrades the HIF leading to the expression of angiogenic growth factors and local proliferation of blood vessels [69, 70]. Another example, given its central role in cancer, is the extensive study of mutations of the p53 tumor suppressor protein. Structural analysis of mutations in p53 could facilitate the dissection of their functional role, in particular their effect on DNA-binding that seems to be key in human cancers [71, 72]. For instance, a mutation in the DNA binding region (Arg273His) has been associated with Li-Fraumeni syndrome and low p53 DNA binding. On the other hand, a mutation of an Arginine in position 175 to Histidine is important in the stabilization of the protein which might regulate the binding to DNA.

In some cases, interactions between two proteins might involve order-disorder transitions in partially disordered regions of the interacting proteins during the binding process. These unstructured or disordered regions have been found to be involved in many disease mechanisms [73–75] (see [76] for a review on intrinsically disordered proteins). For instance, the cancer suppressor BRCA1 has been shown to contain intrinsically disordered regions through which it binds to several proteins [49]. Similarly, some bacteria pathogens’ surface proteins contain intrinsically disordered protein regions. The structural analysis of the host-pathogen protein interactions constitutes an excellent system for targeting by drug designers. The NMR determination of the structural complex of one of these surface proteins, namely the streptococcal fibronecting-binding protein (FnBP), bound to the human fibronectin provided the mechanistic details on how the bacterial target the host cell (see review [77]).


    DISEASES AND PROTEIN INTERACTION NETWORKS
 TOP
 ABSTRACT
 INTRODUCTION
 CONTRIBUTIONS OF STRUCTURAL...
 DISEASES AND PROTEIN INTERACTION...
 CONCLUSIONS AND FINAL REMARKS
 Funding
 FOOTNOTES
 Acknowledgments
 References
 
This section explores the study of protein networks, with a focus on protein–protein interactions, and their impact on understanding diseases. First, it provides an overview of the experimental and computational approaches that have been used to reconstruct the network of human protein interactions (or human interactome). It then lists the basic concepts that define the general properties of a network and introduces recent contributions to biology from this theoretical perspective. Finally, recent approaches to create disease-related protein interaction networks are discussed. Examples of experimental and computational methodologies for network reconstruction are provided.

From one to thousands of interactions
In the past, experimental techniques were limited to reveal a handful of protein–protein interactions at a time. For instance, genetic, biochemical and biophysical techniques mostly study individual interactions [78]. Recent high-throughput experimental analyses represent a dramatic change in the number of interaction data they generated, making possible the reconstruction of whole genome protein networks (see [79–84] and reviews in [85, 86]). These genome-wide analyses rely on the development of computational approaches to understand and visualise these data. Bioinformatics tools could also generate predictions of new functional roles of proteins from existing genomic data. Therefore, bioinformatics has a dual role in the context of protein interaction and diseases: prediction of putative protein interactions and of new gene-disease associations (see previous section), and development of a framework to integrate, represent, and visualise experimental data.

Computational techniques to predict protein interactions have been developed in parallel with experimental advances. These approaches rely on the fact that interacting proteins are more likely to be present in the same set of organisms [87, 88], to conserve the gene order [89, 90], or be fused in some organism [91, 92]. These methods have been successfully used to predict protein interactions but still have many limitations (see reviews [93–96]). The assumption that interacting proteins co-evolve to preserve their function has led to methods that rely on similarities between the evolutionary histories of interacting protein families to predict interacting partners [97–104]. These methods are widely applicable and only require the protein sequence as input. However, the signal from functional co-evolution can sometimes be difficult to detect, resulting in low accuracy in the predictions. Addressing this problem, this technique was recently improved by subtracting the signal from speciation events of unrelated sequences [105, 106] and removing high entropy regions (i.e. regions poorly conserved across species) of the sequences [107].

Proteins meet graph theory
Protein–protein interaction data obtained from high-throughput experimental approaches can be represented as a graph [108, 109]. Proteins constitute the nodes of this graph and interactions between the proteins are represented as lines connecting the nodes. Biological networks have been found to be comparable to communication and social networks. Protein–protein interaction and communication networks share several commonalities, such as scale-free and small-world properties (see definitions in Table 3) [110]. Scale-free networks are fairly robust against random errors but are highly vulnerable to perturbations in highly connected nodes [111]. Certain properties of the protein network could be used to differentiate disease from non-disease proteins. Based on this approach, Xu et al. [112] devised a classifier based on several topological features of the protein interaction network to predict genes related to disease. The classifier was trained on a set of non-disease genes and one of disease genes (from OMIM) and applied to a set of over 5000 human genes. As a result, 970 disease genes were identified with 792 of them already listed in OMIM. Some of the 178 newly predicted disease gene candidates were validated by biological experiments.


View this table:
[in this window]
[in a new window]

 
Table 3 Glossary of terms

 
Protein interaction networks could be used to improve functional annotation since the function of some proteins could be inferred from their role in pathways or protein complexes [113, 114]. Likewise, information about key nodes on disease-related networks could be used in drug discovery. Drug target identification constitutes a good example of the potential of integrating structural data with high-throughput data [115]. The structural details on the binding or allosteric sites could be used to design molecules that affect protein function. On the other hand, the reconstruction of the different protein networks in which the potential target is involved (signaling, metabolic, regulatory, etc), is needed to predict the overall impact of the disruption. If, for example, the target is highly connected (a hub), its inhibition may affect many activities that are essential for the proper function of the cell and is therefore unsuitable as a drug target. Less connected nodes affecting mainly the pathway that leads to disease, on the other hand, could constitute vulnerable points of the disease-related network, thus, they are better candidates for drug target. Ultimately, a more complex system biology approach that integrates and mathematically models the gene, protein and pathway responses would be needed to fully characterise the effects of the system disruptions caused by the drug.

Reconstructing the interaction networks of proteins and its mutants involved in a disease might be the key to understanding the differences between healthy and disease organisms. Recent work by Goehler et al [116] on HD illustrates the potential of those approaches. HD is an autosomal dominant neurodegenerative disease. Currently, there is no pharmacological treatment to prevent the progression of this rare inherited disorder [117]. HD is caused by the repeat expansion of the trinucleotide CAG in the Huntingtin (Htt) gene and is one of several polyglutamine (or polyQ) diseases. This expansion causes aggregation of the mutant Htt in insoluble neuronal inclusion bodies which consequently leads to neuronal degeneration. Goehler et al. [116] reported an experimental strategy to generate the protein–protein interaction network of all proteins related to HD, revealing many new interactions and permitting the functional annotation of several uncharacterised proteins. Most importantly, they discovered an interaction of the Htt with GIT1, a GTPase-activating protein which seems to be required for the Htt aggregation. Upon further validation, the GTI1 could constitute an excellent target for therapeutical strategies [118, 119]. Towards a similar goal, Lim and collaborators [120] developed the network of the interactions among proteins related to ataxias and disorders of Purkinje cell degeneration. They found that most of the proteins related to ataxias interact directly or indirectly with each other. A more recent study corroborates these findings across all of the disease proteins from OMIM [121]. Thus, proteins related to a disease are more likely to interact with proteins already known to cause similar diseases. This motivates several of the gene prioritization computational studies presented in the previous section. Chen et al. [122] presented a computational approach to test and confirm this principle for the subnetwork of interacting proteins associated with Alzeheimer's disease (AD) (see [123] for more details about AD and other diseases associated with aggregates, namely amyloid fibrils, that result from protein misfolding). Chen et al. [122] devised a computational method to enrich the AD's subnetwork based on a heuristic score. The score prioritises proteins with high specificity (favoring the addition of low promiscuously connected proteins) and with high confidence on their interaction data (this weighting addresses the problem of unreliability of the interaction data). In an attempt to derive common features among cancer proteins, Jonsson and Bates [124] performed a systematic computational study of a subset of proteins related to cancer. The authors found that the network topology of the cancer related proteins is quite different from those not involved in the disease, i.e. it was found that cancer proteins are highly connected with other cancer-related proteins. In addition, a study of the protein network of herpesvirus performed by Uetz et al. [125] indicates that viral networks differ significantly from cellular networks, which raises the hypothesis that other intracellular pathogens might also have distinguishing topologies. In a recent study, Goh et al. [126] explored the properties of the human disease network for all known phenotype disease gene associations. The authors found that genes that are essential in early development tend to encode highly connected proteins (hub proteins). Surprisingly, their results suggest that the vast majority of disease-related genes are non-essential and show no tendency to encode hub proteins.


    CONCLUSIONS AND FINAL REMARKS
 TOP
 ABSTRACT
 INTRODUCTION
 CONTRIBUTIONS OF STRUCTURAL...
 DISEASES AND PROTEIN INTERACTION...
 CONCLUSIONS AND FINAL REMARKS
 Funding
 FOOTNOTES
 Acknowledgments
 References
 
Protein interactions are involved in metabolic, signaling, immune and gene-regulatory networks. A better understanding of protein interactions, either with other proteins or with DNA, RNA, membrane or small molecules, could reveal the molecular mechanism of the processes leading to diseases. Mutations in the protein interaction interface (or related sites, e.g. active sites, allosteric binding sites) could evidently disrupt the protein interaction. For example, the etiology of several of the diseases mentioned in this review lies in the disruption of the protein–DNA interaction (e.g. p53 in cancer). For other diseases, the main cause is the disruption of the stability or protein-folding, thus destroying one or several protein–protein interactions (e.g. VHL), or creating new undesired ones (e.g. unfolded proteins tend to aggregate, as in AD and HD). Clearly pathogen-host protein interactions are central for bacteria or viruses hijacking the host immune system.

The study of the phenotypic commonalities of several disorders points to common modules that are responsible for inherited diseases. Evidence of these modules, namely protein complexes or functionally related proteins, emerges from several studies of protein and gene interactions reviewed here. However, several other factors could influence the final disease outcome. These confounders challenge the concept of modularity of genetic diseases (see [39]). Understanding the role of these factors, as illustrated in Figure 1, is a challenge central to the etiology of both Mendelian and complex diseases.

The processes leading to diseases are extremely complex, and so are the proteins and interactions involved in them. Most of the methodologies reviewed here used a simplistic ‘static’ view of the protein and their networks. In reality, proteins are continuously being synthesised from and degraded into amino acids. The kinetics of processes and network dynamics need to be considered to achieve a complete understanding of how the disruptions of proteins and their interactions lead to disease. Finally, it is important to also consider the context-specific (tissue, disease stage and response) functions of protein interactions.

It is clear, then, that a gap still exists between the identification of the disease-associated protein interaction network and the complete understanding of the disease mechanism. The gap is filled, unfortunately, with more questions than answers. The approaches reviewed here generated a considerable amount of valuable data, but also the need for further validation. To that purpose, a number of studies have used data from simpler organisms, such as worm (Caenorhabditis elegans), fly (Drosophila melanogaster) or yeast (Saccharomyces cerevisae). There are limitations, however, to the transfer of interaction annotation across species, in particular those from distantly related organisms [127]. A study of the overlap between the interaction networks of fly, worm, yeast and human data showed that there are only a few conserved interactions among these organisms [121]. Despite these limitations, modeling different aspects of a disease in simpler organisms has proven to be extremely useful. For instance, several aspects of the polyQ diseases (see previous section) were modeled using worm, fly and yeast [128–131].

We are still far from the goal of understanding the etiology of most diseases, further advances on relevant experimental technology, i.e. genetic linkage, protein interaction, protein structure, gene expression, along with computational tools to organise, visualise and integrate these data will provide a step forward in that direction. In particular, the completion of the human protein interactome will provide data that could enhance several of the methodologies reviewed here. In addition, a systematic experimental genome-wide study of protein interactions between host and pathogen, which is not yet available in the literature, could provide insight into the bacteria, virus or parasite mechanisms of pathogenicity. In addition, valuable information about disease and protein interactions is buried within millions of biomedical records. Text mining approaches are therefore essential to recover such information. Indeed, several of the databases and methods to prioritise disease-related genes discussed in previous sections (e.g. Lage et al. [45]) have successfully incorporated text mining techniques.

Ideally, since network and structural approaches are complementary, the combination of network studies with a more detailed structural analysis has the potential to be an excellent framework for the study of disease mechanisms and rational design of drugs. In future, this strategy, and others discussed here, should be integrated into multidisciplinary disease-specific projects that provide a better understanding of a particular disease and help identify disease modules (if any) that are common to related disorders.


Key Points

  • Disruption of an existing protein interaction (by changing the stability of the protein and/or inhibiting the ability to bind to other molecules), production of new undesirable interactions (through mutations that result in misfolding of the protein and aggregation) and disruption of protein-DNA interactions (by affecting gene regulation) are the causes of many diseases.
  • Mutations in one gene could affect other genes. If their protein products interact, the resulting disruption of this interaction could lead to a disease.
  • Analysis of disease-related protein networks confirms that proteins involved in a disease tend to interact with other proteins involved in the same disease.
  • A gene could have a role in several diseases, thus, many diseases could share interaction subnetworks.

 


    Funding
 TOP
 ABSTRACT
 INTRODUCTION
 CONTRIBUTIONS OF STRUCTURAL...
 DISEASES AND PROTEIN INTERACTION...
 CONCLUSIONS AND FINAL REMARKS
 Funding
 FOOTNOTES
 Acknowledgments
 References
 
Support for this work was provided by the Intramural Research Program of the National Institutes of Health, National Library of Medicine.


    Acknowledgments
 TOP
 ABSTRACT
 INTRODUCTION
 CONTRIBUTIONS OF STRUCTURAL...
 DISEASES AND PROTEIN INTERACTION...
 CONCLUSIONS AND FINAL REMARKS
 Funding
 FOOTNOTES
 Acknowledgments
 References
 
Thanks to all the scientists (included or not in this review) that contributed with their excellent work to the field of research reviewed here. Many thanks to Sonia Leach, Leonardo Marino, Eric Neumann, Anna Panchenko, Pedja Ravidojac, Willy Valdivia, Adam Godzik and anonymous reviewers for their helpful comments on the manuscript, to the NIH Fellows Editorial Board for their editorial work, and to Robert Yates for his help in the graphic design of the pictures.


    FOOTNOTES
 TOP
 ABSTRACT
 INTRODUCTION
 CONTRIBUTIONS OF STRUCTURAL...
 DISEASES AND PROTEIN INTERACTION...
 CONCLUSIONS AND FINAL REMARKS
 Funding
 FOOTNOTES
 Acknowledgments
 References
 
Maricel Kann is a postdoctoral fellow at the National Center of Biotechnology Information, NIH (Bethesda, USA). Her research interests include methods for alignment and statistics of protein sequences, predictors of protein–protein interactions, and domain–domain interaction networks and its associations with disease. She has co-chaired several sessions at international bioinformatics conferences related to protein interactions and disease.

Submitted: April 9, 2007. Received (in revised form): June 6, 2007.


    References
 TOP
 ABSTRACT
 INTRODUCTION
 CONTRIBUTIONS OF STRUCTURAL...
 DISEASES AND PROTEIN INTERACTION...
 CONCLUSIONS AND FINAL REMARKS
 Funding
 FOOTNOTES
 Acknowledgments
 References
 

  1. Zuckerkandl E, Pauling L. Molecular disease, evolution, and genic heterogeneity. In: Horizons in Biochemistry—Kasha M, Pullman B, eds. (1962) New York: Academic Press. 189.
  2. Botstein D, White RL, Skolnick M, et al. Construction of a genetic linkage map in man using restriction fragment length polymorphisms. Am J Hum Genet (1980) 32:314–31.[Web of Science][Medline]
  3. Kerem B, Rommens JM, Buchanan JA, et al. Identification of the cystic fibrosis gene: genetic analysis. Science (1989) 245:1073–80.[Abstract/Free Full Text]
  4. Riordan JR, Rommens JM, Kerem B, et al. Identification of the cystic fibrosis gene: cloning and characterization of complementary DNA. Science (1989) 245:1066–73.[Abstract/Free Full Text]
  5. Gusella JF, Wexler NS, Conneally PM, et al. A polymorphic DNA marker genetically linked to Huntington's disease. Nature (1983) 306:234–8.[CrossRef][Medline]
  6. Miki Y, Swensen J, Shattuck-Eidens D, et al. A strong candidate for the breast and ovarian cancer susceptibility gene BRCA1. Science (1994) 266:66–71.[Abstract/Free Full Text]
  7. Wooster R, Bignell G, Lancaster J, et al. Identification of the breast cancer susceptibility gene BRCA2. Nature (1995) 378:789–92.[CrossRef][Medline]
  8. Botstein D, Risch N. Discovering genotypes underlying human phenotypes: past successes for mendelian disease, future approaches for complex disease. Nat Genet (2003) 33(Suppl):228–237.[CrossRef][Web of Science][Medline]
  9. Scriver CR, Waters PJ. Monogenic traits are not simple: lessons from phenylketonuria. Trends Genet (1999) 15:267–72.[CrossRef][Web of Science][Medline]
  10. Sriram G, Martinez JA, McCabe ERB, et al. Single-gene disorders: what role could moonlighting enzymes play? Am J Hum Genet (2005) 76:911–24.[CrossRef][Web of Science][Medline]
  11. Dipple KM, McCabe ER. Modifier genes convert "simple" Mendelian disorders to complex traits. Mol Genet Metab (2000) 71:43–50.[CrossRef][Web of Science][Medline]
  12. Dipple KM, McCabe ER. Phenotypes of patients with "simple" Mendelian disorders are complex traits: thresholds, modifiers, and systems dynamics. Am J Hum Genet (2000) 66:1729–35.[CrossRef][Web of Science][Medline]
  13. Groman JD, Meyer ME, Wilmott RW, et al. Variant cystic fibrosis phenotypes in the absence of CFTR mutations. N Engl J Med (2002) 347:401–7.[Abstract/Free Full Text]
  14. Sun H, Smallwood PM, Nathans J. Biochemical defects in ABCR protein variants associated with human retinopathies. Nature Genet (2000) 26:242–6.[CrossRef][Web of Science][Medline]
  15. Agarwal S, Moorchung N. Modifier genes and oligogenic disease. J Nippon Med Sch (2005) 72:326–34.[CrossRef][Medline]
  16. Badano JL, Katsanis N. Beyond Mendel: an evolving view of human genetic disease transmission. Nat Rev Genet (2002) 3:779–89.[Web of Science][Medline]
  17. Van Heyningen V, Yeyati PL. Mechanisms of non-Mendelian inheritance in genetic disease. Hum Mol Genet (2004) 13(2):R225–33.[CrossRef]
  18. Mayeux R. Mapping the new frontier: complex genetic disorders. J Clin Invest (2005) 115:1404–7.[CrossRef][Web of Science][Medline]
  19. Fuller MT. Interacting genes identify interacting proteins involved in microtubule function in Drosophila. Cell Motil Cytoskeleton (1989) 14:128–35.[CrossRef][Web of Science][Medline]
  20. Stearns T, Botstein D. Unlinked noncomplementation: isolation of new conditional-lethal mutations in each of the tubulin genes of Saccharomyces cerevisiae. Genetics (1988) 119:249–60.[Abstract/Free Full Text]
  21. Adie EA, Adams RR, Evans KL, et al. Speeding disease gene discovery by sequence based candidate prioritization. BMC Bioinformatics (2005) 6:55.[CrossRef][Medline]
  22. Furney SJ, Higgins DG, Ouzounis CA, et al. Structural and functional properties of genes involved in human cancer. BMC Genomics (2006) 7:3.[CrossRef][Medline]
  23. Lopez-Bigas N, Ouzounis CA. Genome-wide identification of genes likely to be involved in human genetic disease. Nucleic Acids Res (2004) 32:3108–14.[Abstract/Free Full Text]
  24. Tu Z, Wang L, Xu M, et al. Further understanding human disease genes by comparing with housekeeping genes and other genes. BMC Genomics (2006) 7:31.[CrossRef][Medline]
  25. Adie EA, Adams RR, Evans KL, et al. SUSPECTS: enabling fast and effective prioritization of positional candidates. Bioinformatics (2006) 22:773–4.[Abstract/Free Full Text]
  26. Franke L, Bakel H, Fokkens L, et al. Reconstruction of a functional human gene network, with an application for prioritizing positional candidate genes. Am J Hum Genet (2006) 78:1011–25.[CrossRef][Web of Science][Medline]
  27. George RA, Liu JY, Feng LL, et al. Analysis of protein sequence and interaction data for candidate disease gene prediction. Nucleic Acids Res (2006) 34:e130.[Abstract/Free Full Text]
  28. Ma X, Lee H, Wang L, et al. CGI: a new approach for prioritizing genes by combining gene expression and protein-protein interaction data. Bioinformatics (2007) 23:215–21.[Abstract/Free Full Text]
  29. Perez-Iratxeta C, Bork P, Andrade MA. Association of genes to genetically inherited diseases using data mining. Nat Genet (2002) 31:316–9.[Web of Science][Medline]
  30. Perez-Iratxeta C, Wjst M, Bork P, et al. G2D: a tool for mining genes associated with disease. BMC Genet (2005) 6:45.[CrossRef][Medline]
  31. Tiffin N, Kelso JF, Powell AR, et al. Integration of text- and data-mining using ontologies successfully selects disease gene candidates. Nucleic Acids Res (2005) 33:1544–52.[Abstract/Free Full Text]
  32. Turner FS, Clutterbuck DR, Semple CAM. POCUS: mining genomic sequence annotation to predict disease genes. Genome Biol (2003) 4:R75.[CrossRef][Medline]
  33. Rossi S, Masotti D, Nardini C, et al. TOM: a web-based integrated approach for identification of candidate disease genes. Nucleic Acids Res (2006) 34:W285–92.[Abstract/Free Full Text]
  34. Aerts S, Lambrechts D, Maity S, et al. Gene prioritization through genomic data fusion. Nat Biotechnol (2006) 24:537–44.[CrossRef][Web of Science][Medline]
  35. Subramanian A, Tamayo P, Mootha VK, et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci USA (2005) 102:15545–50.[Abstract/Free Full Text]
  36. van Driel MA, Cuelenaere K, Kemmeren PP, et al. GeneSeeker: extraction and integration of human disease-related information from web-based genetic databases. Nucleic Acids Res (2005) 33:W758–61.[Abstract/Free Full Text]
  37. Tiffin N, Adie E, Turner F, et al. Computational disease gene identification: a concert of methods prioritizes type 2 diabetes and obesity candidate genes. Nucleic Acids Res (2006) 34:3067–81.[Abstract/Free Full Text]
  38. Oti M, Snel B, Huynen MA, et al. Predicting disease genes using protein-protein interactions. J Med Genet (2006) 43:691–8.[Abstract/Free Full Text]
  39. Oti M, Brunner H. The modular nature of genetic diseases. Clin Genet (2007) 71:1–11.[CrossRef][Web of Science][Medline]
  40. Pinsky L. The polythetic (phenotypic community) system of classifying human malformation syndromes. Birth Defects Orig Artic Ser (1977) 13:13–30.[Medline]
  41. Garcia-Higuera I, Taniguchi T, Ganesan S, et al. Interaction of the Fanconi anemia proteins and BRCA1 in a common pathway. Mol Cell (2001) 7:249–62.[CrossRef][Web of Science][Medline]
  42. Mace G, Bogliolo M, Guervilly JH, et al. 3R coordination by Fanconi anemia proteins. Biochimie (2005) 87:647–58.[Medline]
  43. Sam L, Liu Y, Jianrong L, et al. Discovery of protein interaction networks shared by diseases. Pac Symp Biocomput (2007) 12:76–87.
  44. Spivak G. The many faces of Cockayne syndrome. Proc Natl Acad Sci USA (2004) 101:15273–4.[Free Full Text]
  45. Lage K, Karlberg EO, Stãrling ZM, et al. A human phenome-interactome network of protein complexes implicated in genetic disorders. Nat Biotechnol (2007) 25:309–16.[CrossRef][Web of Science][Medline]
  46. dbGAP. http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=gap (July 4, 2007, data last accessed).
  47. Lussier YA, Liu Y. Computational approaches to phenotyping: high-throughput phenomics. Proc Am Thorac Soc (2007) 4:18–25.[Abstract/Free Full Text]
  48. Scriver CR. After the genome–the phenome? J Inherit Metab Dis (2004) 27:305–17.[CrossRef][Web of Science][Medline]
  49. Butte AJ, Kohane IS. Creation and implications of a phenome-genome network. Nat Biotechnol (2006) 24:55–62.[CrossRef][Web of Science][Medline]
  50. Brenner SE. A tour of structural genomics. Nature Reviews Genetics (2001) 2:801–9.[CrossRef][Web of Science][Medline]
  51. Todd AE, Marsden RL, Thornton JM, et al. Progress of structural genomics initiatives: an analysis of solved target structures. J Mol Biol (2005) 348:1235–60.[CrossRef][Web of Science][Medline]
  52. Berman HM, Westbrook J, Feng Z, et al. The protein data bank. Nucleic Acids Res (2000) 28:235–42.[Abstract/Free Full Text]
  53. Bartlett GJ, Todd AE, Thornton JM. Inferring protein function from structure. Methods Biochem Anal (2003) 44:387–407.[Medline]
  54. Ofran Y, Punta M, Schneider R, et al. Beyond annotation transfer by homology: novel protein-function prediction methods to assist drug discovery. Drug Discov Today (2005) 10:1475–82.[CrossRef][Web of Science][Medline]
  55. The International HapMap C. A haplotype map of the human genome. Nature (2005) 437:1299–320.[CrossRef][Medline]
  56. Sunyaev S, Ramensky V, Bork P. Towards a structural basis of human non-synonymous single nucleotide polymorphisms. Trends Genet (2000) 16:198–200.[CrossRef][Web of Science][Medline]
  57. Mooney S. Bioinformatics approaches and resources for single nucleotide polymorphism functional analysis. Brief Bioinform (2005) 6:44–56.[Abstract/Free Full Text]
  58. Ng PC, Henikoff S. Accounting for human polymorphisms predicted to affect protein function. Genome Res (2002) 12:436–46.[Abstract/Free Full Text]
  59. Saunders CT, Baker D. Evaluation of structural and evolutionary contributions to deleterious mutation prediction. J Mol Biol (2002) 322:891–901.[CrossRef][Web of Science][Medline]
  60. Hamosh A, Scott AF, Amberger JS, et al. Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Res (2005) 33:D514–7.[Abstract/Free Full Text]
  61. Lussier Y, Borlawsky T, Rappaport D, et al. PhenoGO: assigning phenotypic context to gene ontology annotations with natural language processing. Pac Symp Biocomput (2006) 11:64–75.
  62. Safran M, Chalifa-Caspi V, Shmueli O, et al. Human Gene-Centric Databases at the Weizmann Institute of Science: GeneCards, UDB, CroW 21 and HORDE. Nucleic Acids Res (2003) 31:142–6.[Abstract/Free Full Text]
  63. Kahraman A, Avramov A, Nashev LG, et al. PhenomicDB: a multi-species genotype/phenotype database for comparative phenomics. Bioinformatics (2005) 21:418–20.[Abstract/Free Full Text]
  64. Altman RB. PharmGKB: a logical home for knowledge relating genotype to drug response phenotype. Nat Genet (2007) 39:426.[CrossRef][Web of Science][Medline]
  65. Yue P, Li Z, Moult J. Loss of protein structure stability as a major causative factor in monogenic disease. J Mol Biol (2005) 353:459–73.[CrossRef][Web of Science][Medline]
  66. Steward RE, MacArthur MW, Laskowski RA, et al. Molecular basis of inherited diseases: a structural perspective. Trends Genet (2003) 19:505–13.[CrossRef][Web of Science][Medline]
  67. Vitkup D, Sander C, Church GM. The amino-acid mutational spectrum of human genetic disease. Genome Biol (2003) 4:R72.[CrossRef][Medline]
  68. Ye Y, Li Z, Godzik A. Modeling and analyzing three-dimensional structures of human disease proteins. Pac Symp Biocomput (2006) 439–50.
  69. Brauch H, Kishida T, Glavac D, et al. Von Hippel-Lindau (VHL) disease with pheochromocytoma in the Black Forest region of Germany: evidence for a founder effect. Hum Genet (1995) 95:551–6.[Web of Science][Medline]
  70. Ohh M, Park CW, Ivan M, et al. Ubiquitination of hypoxia-inducible factor requires direct binding to the beta-domain of the von Hippel-Lindau protein. Nat Cell Biol (2000) 2:423–7.[CrossRef][Web of Science][Medline]
  71. Martin AC, Facchiano AM, Cuff AL, et al. Integrating mutation data and structural analysis of the TP53 tumor-suppressor protein. Hum Mutat (2002) 19:149–64.[CrossRef][Web of Science][Medline]
  72. Cho Y, Gorina S, Jeffrey PD, et al. Crystal structure of a p53 tumor suppressor-DNA complex: understanding tumorigenic mutations. Science (1994) 265:346–55.[Abstract/Free Full Text]
  73. Ryan DP, Matthews JM. Protein-protein interactions in human disease. Curr Opin Struct Biol (2005) 15:441–6.[CrossRef][Web of Science][Medline]
  74. Cheng Y, LeGall T, Oldfield CJ, et al. Abundance of intrinsic disorder in protein associated with cardiovascular disease. Biochemistry (2006) 45:10448–60.[CrossRef][Medline]
  75. Iakoucheva LM, Brown CJ, Lawson JD, et al. Intrinsic disorder in cell-signaling and cancer-associated proteins. J Mol Biol (2002) 323:573–4.[CrossRef][Web of Science][Medline]
  76. Dunker AK, Lawson JD, Brown CJ, et al. Intrinsically disordered protein. J Mol Graph Model (2001) 19:26–59.[CrossRef][Web of Science][Medline]
  77. Schwarz-Linek U, Hook M, Potts JR. Fibronectin-binding proteins of gram-positive cocci. Microbes Infect (2006) 8:2291–8.[CrossRef][Web of Science][Medline]
  78. Fu H. Protein-Protein Interactions: Methods and Applications (Methods in Molecular Biology). Fu H, ed. (2004) Totowa, New Jersey: Humana Press. 544.
  79. Ito T, Chiba T, Ozawa R, et al. A comprehensive two-hybrid analysis to explore the yeast protein interactome. Proc Natl Acad Sci USA (2001) 98:4569–74.[Abstract/Free Full Text]
  80. Uetz P, Giot L, Cagney G, et al. A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae. Nature (2000) 403:623–7.[CrossRef][Medline]
  81. Gavin AC, Bosche M, Krause R, et al. Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature (2002) 415:141–7.[CrossRef][Medline]
  82. Ho Y, Gruhler A, Heilbut A, et al. Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature (2002) 415:180–3.[CrossRef][Medline]
  83. Gavin AC, Aloy P, Grandi P, et al. Proteome survey reveals modularity of the yeast cell machinery. Nature (2006) 440:631–6.[CrossRef][Medline]
  84. Krogan NJ, Cagney G, Yu H, et al. Global landscape of protein complexes in the yeast Saccharomyces cerevisiae. Nature (2006) 440:637–43.[CrossRef][Medline]
  85. Titz B, Schlesner M, Uetz P. What do we learn from high-throughput protein interaction data? Expert Rev Proteomics (2004) 1:111–21.[CrossRef][Web of Science][Medline]
  86. Shoemaker BA, Panchenko AR. Deciphering protein–protein interactions. Part I. Experimental techniques and databases. PLoS Comput Biol (2007) 3:e42.[CrossRef][Medline]
  87. Huynen MA, Bork P. Measuring genome evolution. Proc Natl Acad Sci USA (1998) 95:5849–56.[Abstract/Free Full Text]
  88. Pellegrini M, Marcotte EM, Thompson MJ, et al. Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. Proc Natl Acad Sci USA (1999) 96:4285–8.[Abstract/Free Full Text]
  89. Dandekar T, Snel B, Huynen M, et al. Conservation of gene order: a fingerprint of proteins that physically interact. Trends Biochem Sci (1998) 23:324–8.[CrossRef][Web of Science][Medline]
  90. Overbeek R, Fonstein M, D'Souza M, et al. Use of contiguity on the chromosome to predict functional coupling. In Silico Biol (1999) 1:93–108.[Medline]
  91. Marcotte FM, Pellegrini M, Ng HL, et al. Detecting protein function and protein-protein interactions from genome sequences. Science (1999) 285:751–3.[Abstract/Free Full Text]
  92. Enright AJ, Iliopoulos I, Kyrpides NC, et al. Protein interaction maps for complete genomes based on gene fusion events. Nature (1999) 402:86–90.[CrossRef][Medline]
  93. Shoemaker BA, Panchenko AR. Deciphering protein–protein interactions. Part II. Computational methods to predict protein and domain interaction partners. PLoS Comput Biol (2007) 3:e42.[CrossRef][Medline]
  94. Valencia A, Pazos F. Computational methods for the prediction of protein interactions. Curr Opin Struct Biol (2002) 12:368–73.[CrossRef][Web of Science][Medline]
  95. Rost B, Liu J, Nair R, et al. Automatic prediction of protein function. Cell Mol Life Sci (2003) 60:2637–50.[CrossRef][Web of Science][Medline]
  96. Shoemaker BA, Panchenko AR. Deciphering protein-protein interactions. Part II. Computational methods to predict protein and domain interaction partners. PLoS Comput Biol (2007) 3:e43.[CrossRef][Medline]
  97. Gertz J, Elfond G, Shustrova A, et al. Inferring protein interactions from phylogenetic distance matrices. Bioinformatics (2003) 19:2039–45.[Abstract/Free Full Text]
  98. Goh CS, Bogan AA, Joachimiak M, et al. Co-evolution of proteins with their interaction partners. J Mol Biol (2000) 299:283–93.[CrossRef][Web of Science][Medline]
  99. Goh CS, Cohen FE. Co-evolutionary analysis reveals insights into protein-protein interactions. J Mol Biol (2002) 324:177–92.[CrossRef][Web of Science][Medline]
  100. Jothi R, Kann MG, Przytycka TM. Predicting protein-protein interaction by searching evolutionary tree automorphism space. Bioinformatics (2005) 21(Suppl 1):i241–50.[Abstract]
  101. Pazos F, Helmer-Citterich M, Ausiello G, et al. Correlated mutations contain information about protein-protein interaction. J Mol Biol (1997) 271:511–23.[CrossRef][Web of Science][Medline]
  102. Pazos F, Valencia A. In silico two-hybrid system for the selection of physically interacting protein pairs. Proteins (2002) 47:219–27.[CrossRef][Web of Science][Medline]
  103. Ramani AK, Marcotte EM. Exploiting the co-evolution of interacting proteins to discover interaction specificity. J Mol Biol (2003) 327:273–84.[CrossRef][Web of Science][Medline]
  104. Jothi R, Cherukuri PF, Tasneem A, et al. Co-evolutionary analysis of domains in interacting proteins reveals insights into domain-domain interactions mediating protein-protein interactions. J Mol Biol (2006) 362:861–75.[CrossRef][Web of Science][Medline]
  105. Pazos F, Ranea JA, Juan D, et al. Assessing protein co-evolution in the context of the tree of life assists in the prediction of the interactome. J Mol Biol (2005) 352:1002–15.[CrossRef][Web of Science][Medline]
  106. Sato T, Yamanishi Y, Kanehisa M, et al. The inference of protein-protein interactions by co-evolutionary analysis is improved by excluding the information about the phylogenetic relationships. Bioinformatics (2005) 21:3482–9.[Abstract/Free Full Text]
  107. Kann MG, Jothi R, Cherukuri PF, et al. Predicting protein domain interactions from coevolution of conserved regions. Proteins (2007) 67:811–20.[CrossRef][Web of Science][Medline]
  108. Barabasi AL, Oltvai ZN. Network biology: understanding the cell's functional organization. Nat Rev Genet (2004) 5:101–13.[CrossRef][Web of Science][Medline]
  109. Grindrod P, Kibble M. Review of uses of network and graph theory concepts within proteomics. Expert Rev Proteomics (2004) 1:229–38.[CrossRef][Web of Science][Medline]
  110. Yook SH, Oltvai ZN, Barabasi AL. Functional and topological characterization of protein interaction networks. Proteomics (2004) 4:928–42.[CrossRef][Web of Science][Medline]
  111. Albert R, Jeong H, Barabasi AL. Error and attack tolerance of complex networks. Nature (2000) 406:378–82.[CrossRef][Medline]
  112. Xu J, Li Y. Discovering disease-genes by topological features in human protein-protein interaction network. Bioinformatics (2006) 22:2800–05.[Abstract/Free Full Text]
  113. Huynen MA, Snel B, von Mering C, et al. Function prediction and protein networks. Curr Opin Cell Biol (2003) 15:191–8.[CrossRef][Web of Science][Medline]
  114. Droit A, Poirier GG, Hunter JM. Experimental and bioinformatic approaches for interrogating protein-protein interactions to determine protein function. J Mol Endocrinol (2005) 34:263–80.[Abstract/Free Full Text]
  115. Jiang Z, Zhou Y. Using bioinformatics for drug target identification from the genome. Am J Pharmacogenomics (2005) 5:387–96.[CrossRef][Web of Science][Medline]
  116. Goehler H, Lalowski M, Stelzl U, et al. A protein interaction network links GIT1, an enhancer of huntingtin aggregation, to Huntington's disease. Mol Cell (2004) 15:853–65.[CrossRef][Web of Science][Medline]
  117. Herbst M, Wanker EE. Therapeutic approaches to polyglutamine diseases: combating protein misfolding and aggregation. Curr Pharm Des (2006) 12:2543–55.[CrossRef][Web of Science][Medline]
  118. Duennwald ML, Jagadish S, Giorgini F, et al. A network of protein interactions determines polyglutamine toxicity. Proc Natl Acad Sci USA (2006) 103:11051–6.[Abstract/Free Full Text]
  119. Giorgini F, Muchowski PJ. Connecting the dots in Huntington's disease with protein interaction networks. Genome Biol (2005) 6:210.[CrossRef][Medline]
  120. Lim J, Hao T, Shaw C, et al. A protein-protein interaction network for human inherited ataxias and disorders of Purkinje cell degeneration. Cell (2006) 125:801–14.[CrossRef][Web of Science][Medline]
  121. Gandhi TK, Zhong J, Mathivanan S, et al. Analysis of the human protein interactome and comparison with yeast, worm and fly interaction datasets. Nat Genet (2006) 38:285–93.[CrossRef][Web of Science][Medline]
  122. Chen JY, Shen C, Sivachenko AY. Mining Alzheimer disease relevant proteins from integrated protein interactome data. Pac Symp Biocomput (2006) 11:367–78.
  123. Chiti F, Dobson CM. Protein misfolding, functional amyloid, and human disease. Annu Rev Biochem (2006) 75:333–66.[CrossRef][Web of Science][Medline]
  124. Jonsson PF, Bates PA. Global topological features of cancer proteins in the human interactome. Bioinformatics (2006) 22:2291–7.[Abstract/Free Full Text]
  125. Uetz P, Dong YA, Zeretzke C, et al. Herpesviral protein networks and their interaction with the human proteome. Science (2006) 311:239–42.[Abstract/Free Full Text]
  126. Goh KI, Cusick ME, Valle D, et al. The human disease network. Proc Natl Acad Sci USA (2007) 104:8685–90.[Abstract/Free Full Text]
  127. Mika S, Rost B. Protein-protein interactions more conserved within species than across species. PLoS Comput Biol (2006) 2:e79.[CrossRef][Medline]
  128. Willingham S, Outeiro TF, DeVit MJ, et al. Yeast genes that enhance the toxicity of a mutant huntingtin fragment or alpha-synuclein. Science (2003) 302:1769–72.[Abstract/Free Full Text]
  129. Giorgini F, Guidetti P, Nguyen Q, et al. A genomic screen in yeast implicates kynurenine 3-monooxygenase as a therapeutic target for Huntington disease. Nat Genet (2005) 37:526–31.[CrossRef][Web of Science][Medline]
  130. Kazemi-Esfarjani P, Benzer S. Genetic suppression of polyglutamine toxicity in Drosophila. Science (2000) 287:1837–40.[Abstract/Free Full Text]
  131. Nollen EA, Garcia SM, van Haaften G, et al. Genome-wide RNA interference screen identifies previously undescribed regulators of polyglutamine aggregation. Proc Natl Acad Sci USA (2004) 101:6403–8.[Abstract/Free Full Text]
  132. Yue P, Melamud E, Moult J. SNPs3D: candidate gene and SNP selection for association studies. BMC Bioinformatics (2006) 7:166.[CrossRef][Medline]
  133. Dantzer J, Moad C, Heiland R, et al. MutDB services: interactive structural analysis of mutation data. Nucleic Acids Res (2005) 33:W311–4.[Abstract/Free Full Text]
  134. Mathe E, Olivier M, Kato S, et al. Computational approaches for predicting the biological effect of p53 missense mutations: a comparison of three sequence analysis based methods. Nucleic Acids Res (2006) 34:1317–25.[Abstract/Free Full Text]
  135. Ng PC, Henikoff S. SIFT: Predicting amino acid changes that affect protein function. Nucleic Acids Res (2003) 31:3812–4.[Abstract/Free Full Text]
  136. O'Brien KP, Westerlund I, Sonnhammer EL. OrthoDisease: a database of human disease orthologs. Hum Mutat (2004) 24:112–9.[CrossRef][Web of Science][Medline]
  137. Shannon P, Markiel A, Ozier O, et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res (2003) 13:2498–504.[Abstract/Free Full Text]

Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Brief BioinformHome page
J. Zhao, P. Jiang, and W. Zhang
Molecular networks for the study of TCM Pharmacology
Brief Bioinform, December 28, 2009; (2009) bbp063v1.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
J. Chen, E. E. Bardes, B. J. Aronow, and A. G. Jegga
ToppGene Suite for gene list enrichment analysis and candidate gene prioritization
Nucleic Acids Res., July 1, 2009; 37(suppl_2): W305 - W311.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
R. Chowdhary, J. Zhang, and J. S. Liu
Bayesian inference of protein-protein interactions from biological literature
Bioinformatics, June 15, 2009; 25(12): 1536 - 1542.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
D.-S. Lee, J. Park, K. A. Kay, N. A. Christakis, Z. N. Oltvai, and A.-L. Barabasi
Cozzarelli Prize Winner@;DELIM@;From the Cover: The implications of human metabolic network topology for disease comorbidity
PNAS, July 22, 2008; 105(29): 9880 - 9885.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
C.-e. A. Chang, W. A. McLaughlin, R. Baron, W. Wang, and J. A. McCammon
Entropic contributions and the influence of the hydrophobic environment in promiscuous protein-protein association
PNAS, May 27, 2008; 105(21): 7456 - 7461.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
T. Ideker and R. Sharan
Protein networks in disease
Genome Res., April 1, 2008; 18(4): 644 - 652.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
8/5/333    most recent
bbm031v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Kann, M. G.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Kann, M. G.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?