首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Inference of protein functions is one of the most important aims of modern biology. To fully exploit the large volumes of genomic data typically produced in modern-day genomic experiments, automated computational methods for protein function prediction are urgently needed. Established methods use sequence or structure similarity to infer functions but those types of data do not suffice to determine the biological context in which proteins act. Current high-throughput biological experiments produce large amounts of data on the interactions between proteins. Such data can be used to infer interaction networks and to predict the biological process that the protein is involved in. Here, we develop a probabilistic approach for protein function prediction using network data, such as protein-protein interaction measurements. We take a Bayesian approach to an existing Markov Random Field method by performing simultaneous estimation of the model parameters and prediction of protein functions. We use an adaptive Markov Chain Monte Carlo algorithm that leads to more accurate parameter estimates and consequently to improved prediction performance compared to the standard Markov Random Fields method. We tested our method using a high quality S.cereviciae validation network with 1622 proteins against 90 Gene Ontology terms of different levels of abstraction. Compared to three other protein function prediction methods, our approach shows very good prediction performance. Our method can be directly applied to protein-protein interaction or coexpression networks, but also can be extended to use multiple data sources. We apply our method to physical protein interaction data from S. cerevisiae and provide novel predictions, using 340 Gene Ontology terms, for 1170 unannotated proteins and we evaluate the predictions using the available literature.  相似文献   

2.
Identification of extracellular ligand-receptor interactions is important for drug design and the treatment of diseases. Difficulties in detecting these interactions using high-throughput experimental techniques motivate the development of computational prediction methods. We propose a novel threading algorithm, LTHREADER, which generates accurate local sequence-structure interface alignments and integrates various statistical scores and experimental binding data to predict interactions within ligand-receptor families. LTHREADER uses a profile of secondary structure and solvent accessibility predictions with residue contact maps to guide and constrain alignments. Using a decision tree classifier and low-throughput experimental data for training, it combines information inferred from statistical interaction potentials, energy functions, correlated mutations, and conserved residue pairs to predict interactions. We apply our method to cytokines, which play a central role in the development of many diseases including cancer and inflammatory and autoimmune disorders. We tested our approach on two representative families from different structural classes (all-alpha and all-beta proteins) of cytokines. In comparison with the state-of-the-art threader RAPTOR, LTHREADER generates on average 20% more accurate alignments of interacting residues. Furthermore, in cross-validation tests, LTHREADER correctly predicts experimentally confirmed interactions for a common binding mode within the 4-helical long-chain cytokine family with 75% sensitivity and 86% specificity with 40% gain in sensitivity compared to RAPTOR. For the TNF-like family our method achieves 70% sensitivity with 55% specificity with 70% gain in sensitivity. LTHREADER combines information from multiple complex templates when such data are available. When only one solved structure is available, a localized PSI-BLAST approach also outperforms standard threading methods with 25%-50% improvements in sensitivity.  相似文献   

3.
Kaur H  Raghava GP 《FEBS letters》2004,564(1-2):47-57
In this study, an attempt has been made to develop a neural network-based method for predicting segments in proteins containing aromatic-backbone NH (Ar-NH) interactions using multiple sequence alignment. We have analyzed 3121 segments seven residues long containing Ar-NH interactions, extracted from 2298 non-redundant protein structures where no two proteins have more than 25% sequence identity. Two consecutive feed-forward neural networks with a single hidden layer have been trained with standard back-propagation as learning algorithm. The performance of the method improves from 0.12 to 0.15 in terms of Matthews correlation coefficient (MCC) value when evolutionary information (multiple alignment obtained from PSI-BLAST) is used as input instead of a single sequence. The performance of the method further improves from MCC 0.15 to 0.20 when secondary structure information predicted by PSIPRED is incorporated in the prediction. The final network yields an overall prediction accuracy of 70.1% and an MCC of 0.20 when tested by five-fold cross-validation. Overall the performance is 15.2% higher than the random prediction. The method consists of two neural networks: (i) a sequence-to-structure network which predicts the aromatic residues involved in Ar-NH interaction from multiple alignment of protein sequences and (ii) a structure-to structure network where the input consists of the output obtained from the first network and predicted secondary structure. Further, the actual position of the donor residue within the 'potential' predicted fragment has been predicted using a separate sequence-to-structure neural network. Based on the present study, a server Ar_NHPred has been developed which predicts Ar-NH interaction in a given amino acid sequence. The web server Ar_NHPred is available at and (mirror site).  相似文献   

4.
The nucleotide sequence of 1179 b.p. preceding the trp operon genes has been established. There are no open reading frames large enough to code for proteins containing more than 97 amino acid residues. In all cases the coding sequences do not contain the initiation codons. The determined sequence is concluded to represent an intercistronic region.  相似文献   

5.
Recent advances in functional genomics have helped generate large-scale high-throughput protein interaction data. Such networks, though extremely valuable towards molecular level understanding of cells, do not provide any direct information about the regions (domains) in the proteins that mediate the interaction. Here, we performed co-evolutionary analysis of domains in interacting proteins in order to understand the degree of co-evolution of interacting and non-interacting domains. Using a combination of sequence and structural analysis, we analyzed protein-protein interactions in F1-ATPase, Sec23p/Sec24p, DNA-directed RNA polymerase and nuclear pore complexes, and found that interacting domain pair(s) for a given interaction exhibits higher level of co-evolution than the non-interacting domain pairs. Motivated by this finding, we developed a computational method to test the generality of the observed trend, and to predict large-scale domain-domain interactions. Given a protein-protein interaction, the proposed method predicts the domain pair(s) that is most likely to mediate the protein interaction. We applied this method on the yeast interactome to predict domain-domain interactions, and used known domain-domain interactions found in PDB crystal structures to validate our predictions. Our results show that the prediction accuracy of the proposed method is statistically significant. Comparison of our prediction results with those from two other methods reveals that only a fraction of predictions are shared by all the three methods, indicating that the proposed method can detect known interactions missed by other methods. We believe that the proposed method can be used with other methods to help identify previously unrecognized domain-domain interactions on a genome scale, and could potentially help reduce the search space for identifying interaction sites.  相似文献   

6.
The influence of long-range residue interactions on defining secondary structure in a protein has long been discussed and is often cited as the current limitation to accurate secondary structure prediction. There are several experimental examples where a local sequence alone is not sufficient to determine its secondary structure, but a comprehensive survey on a large data set has not yet been done. Interestingly, some earlier studies denied the negative effect of long-range interactions on secondary structure prediction accuracy. Here, we have introduced the residue contact order (RCO), which directly indicates the separation of contacting residues in terms of the position in the sequence, and examined the relationship between the RCO and the prediction accuracy. A large data set of 2777 nonhomologous proteins was used in our analysis. Unlike previous studies, we do find that prediction accuracy drops as residues have contacts with more distant residues. Moreover, this negative correlation between the RCO and the prediction accuracy was found not only for beta-strands, but also for alpha-helices. The prediction accuracy of beta-strands is lower if residues have a high RCO or a low RCO, which corresponds to the situation that a beta-sheet is formed by beta-strands from different chains in a protein complex. The reason why the current study draws the opposite conclusion from the previous studies is examined. The implication for protein folding is also discussed.  相似文献   

7.
Bhardwaj N  Lu H 《FEBS letters》2007,581(5):1058-1066
Protein-DNA interactions are crucial to many cellular activities such as expression-control and DNA-repair. These interactions between amino acids and nucleotides are highly specific and any aberrance at the binding site can render the interaction completely incompetent. In this study, we have three aims focusing on DNA-binding residues on the protein surface: to develop an automated approach for fast and reliable recognition of DNA-binding sites; to improve the prediction by distance-dependent refinement; use these predictions to identify DNA-binding proteins. We use a support vector machines (SVM)-based approach to harness the features of the DNA-binding residues to distinguish them from non-binding residues. Features used for distinction include the residue's identity, charge, solvent accessibility, average potential, the secondary structure it is embedded in, neighboring residues, and location in a cationic patch. These features collected from 50 proteins are used to train SVM. Testing is then performed on another set of 37 proteins, much larger than any testing set used in previous studies. The testing set has no more than 20% sequence identity not only among its pairs, but also with the proteins in the training set, thus removing any undesired redundancy due to homology. This set also has proteins with an unseen DNA-binding structural class not present in the training set. With the above features, an accuracy of 66% with balanced sensitivity and specificity is achieved without relying on homology or evolutionary information. We then develop a post-processing scheme to improve the prediction using the relative location of the predicted residues. Balanced success is then achieved with average sensitivity, specificity and accuracy pegged at 71.3%, 69.3% and 70.5%, respectively. Average net prediction is also around 70%. Finally, we show that the number of predicted DNA-binding residues can be used to differentiate DNA-binding proteins from non-DNA-binding proteins with an accuracy of 78%. Results presented here demonstrate that machine-learning can be applied to automated identification of DNA-binding residues and that the success rate can be ameliorated as more features are added. Such functional site prediction protocols can be useful in guiding consequent works such as site-directed mutagenesis and macromolecular docking.  相似文献   

8.
Protein co-evolution under structural and functional constraints necessitates the preservation of important interactions. Identifying functionally important regions poses many obstacles in protein engineering efforts. In this paper, we present a bioinformatics-inspired approach (residue correlation analysis, RCA) for predicting functionally important domains from protein family sequence data. RCA is comprised of two major steps: (i) identifying pairs of residue positions that mutate in a coordinated manner, and (ii) using these results to identify protein regions that interact with an uncommonly high number of other residues. We hypothesize that strongly correlated pairs result not only from contacting pairs, but also from residues that participate in conformational changes involved during catalysis or important interactions necessary for retaining functionality. The results show that highly mobile loops that assist in ligand association/dissociation tend to exhibit high correlation. RCA results exhibit good agreement with the findings of experimental and molecular dynamics studies for the three protein families that are analyzed: (i) DHFR (dihydrofolate reductase), (ii) cyclophilin, and (iii) formyl-transferase. Specifically, the specificity (percentage of correct predictions) in all three cases is substantially higher than those obtained by entropic measures or contacting residue pairs. In addition, we use our approach in a predictive fashion to identify important regions of a transmembrane amino acid transporter protein for which there is limited structural and functional information available.  相似文献   

9.
Sibling cannibalism occurs in many species, yet understanding of sibling cannibalism as an adaptation currently lags behind understanding of other antagonistic interactions among siblings. Observed sibling cannibalism phenotypes likely reflect the interaction between competitive games among siblings and parent-offspring conflict. Using a game-theoretic approach, we derive optimal offspring cannibalism behaviour and parental modifiers that limit or facilitate cannibalism. The results are compared to contemporary frequency-independent analysis. With the addition of game interactions among siblings or parent-offspring co-evolution, our model predicts increased cannibalism (compared to the frequency-independent prediction), as offspring compete to eat siblings. When infertile eggs are present--strengthening competition--offspring risk eating viable siblings in order to gain access to infertile eggs, intensifying parent-offspring conflict. We use the results to make new predictions about the occurrence of sibling cannibalism. Additionally, we demonstrate the utility of trophic egg laying as a maternal mechanism to promote egg eating.  相似文献   

10.
11.
Pyrococcus furiosus has an operon containing the DNA polymerase II (PolD) gene and three other genes. Using a two-hybrid screening to examine the interactions of the proteins encoded by the operon, we identified a specific interaction between the second subunit of PolD (DP1) and a Rad51/Dmc1 homologous protein (RadB). To ensure the specific interaction between these two proteins, each gene in the operon was expressed in Escherichia coli or insect cells separately and the products were purified. The in vitro analyses using the purified proteins also showed the interaction between DP1 and RadB. The deletion mutant analysis of DP1 revealed that a region important for binding with RadB is located in the central part of the sequence (amino acid residues 206-498). This region has an overlap to the C-terminal half (amino acids 334-613), which is highly conserved among euryarchaeal DP1s and is essential for the activity of PolD. Our results suggest that, although RadB does not noticeably affect the primer extension ability of PolD in vitro, PolD may utilize the RadB protein in DNA synthesis under certain conditions.  相似文献   

12.
Polyketides, a diverse group of heteropolymers with antibiotic and antitumor properties, are assembled in bacteria by multiprotein chains of modular polyketide synthase (PKS) proteins. Specific protein-protein interactions determine the order of proteins within a multiprotein chain, and thereby the order in which chemically distinct monomers are added to the growing polyketide product. Here we investigate the evolutionary and molecular origins of protein interaction specificity. We focus on the short, conserved N- and C-terminal docking domains that mediate interactions between modular PKS proteins. Our computational analysis, which combines protein sequence data with experimental protein interaction data, reveals a hierarchical interaction specificity code. PKS docking domains are descended from a single ancestral interacting pair, but have split into three phylogenetic classes that are mutually noninteracting. Specificity within one such compatibility class is determined by a few key residues, which can be used to define compatibility subclasses. We identify these residues using a novel, highly sensitive co-evolution detection algorithm called CRoSS (correlated residues of statistical significance). The residue pairs selected by CRoSS are involved in direct physical interactions in a docked-domain NMR structure. A single PKS system can use docking domain pairs from multiple classes, as well as domain pairs from multiple subclasses of any given class. The termini of individual proteins are frequently shuffled, but docking domain pairs straddling two interacting proteins are linked as an evolutionary module. The hierarchical and modular organization of the specificity code is intimately related to the processes by which bacteria generate new PKS pathways.  相似文献   

13.
14.
A recombinant cosmid carrying the Methanobacterium thermoautotrophicum Marburg trp genes was selected by complementation of Escherichia coli trp mutations. A 7.3-kb fragment of the cloned archaeal DNA was sequenced. It contained the seven trp genes, arranged adjacent to each other in the order trpEGCFBAD. No gene fusions were observed. The trp genes were organized in an operonlike structure, with four short (5- to 56-bp) intergenic regions and two overlapping genes. There was no indication for an open reading frame encoding a leader peptide in the upstream region of trpE. The gene order observed in the M. thermoautotrophicum trp operon was different from all known arrangements of the trp genes in archaea, bacteria, and eucarya. The encoded sequences of the Methanobacterium Trp proteins were similar in size to their bacterial and eucaryal counterparts, and all of them contained the segments of highly similar or invariant amino acid residues recognized in the Trp enzymes from bacteria and eucarya. The TrpE, TrpG, TrpC, TrpA, and TrpD proteins were 30 to 50% identical to those from representatives of other species. Significantly less sequence conservation (18 to 30%) was observed for TrpF, and TrpB exhibited a high degree of identity (50 to 62%) to the sequences of representatives of the three domains. With the exception of TrpB, the beta subunit of tryptophan synthase, tryptophan was absent from all Trp polypeptides.  相似文献   

15.
The "ribose zipper", an important element of RNA tertiary structure, is characterized by consecutive hydrogen-bonding interactions between ribose 2'-hydroxyls from different regions of an RNA chain or between RNA chains. These tertiary contacts have previously been observed to also involve base-backbone and base-base interactions (A-minor type). We searched for ribose zipper tertiary interactions in the crystal structures of the large ribosomal subunit RNAs of Haloarcula marismortui and Deinococcus radiodurans, and the small ribosomal subunit RNA of Thermus thermophilus and identified a total of 97 ribose zippers. Of these, 20 were found in T. thermophilus 16 S rRNA, 44 in H. marismortui 23 S rRNA (plus 2 bridging 5 S and 23 S rRNAs) and 30 in D. radiodurans 23 S rRNA (plus 1 bridging 5 S and 23 S rRNAs). These were analyzed in terms of sequence conservation, structural conservation and stability, location in secondary structure, and phylogenetic conservation. Eleven types of ribose zippers were defined based on ribose-base interactions. Of these 11, seven were observed in the ribosomal RNAs. The most common of these is the canonical ribose zipper, originally observed in the P4-P6 group I intron fragment. All ribose zippers were formed by antiparallel chain interactions and only a single example extended beyond two residues, forming an overlapping ribose zipper of three consecutive residues near the small subunit A-site. Almost all ribose zippers link stem (Watson-Crick duplex) or stem-like (base-paired), with loop (external, internal, or junction) chain segments. About two-thirds of the observed ribose zippers interact with ribosomal proteins. Most of these ribosomal proteins bridge the ribose zipper chain segments with basic amino acid residues hydrogen bonding to the RNA backbone. Proteins involved in crucial ribosome function and in early stages of ribosomal assembly also stabilize ribose zipper interactions. All ribose zippers show strong sequence conservation both within these three ribosomal RNA structures and in a large database of aligned prokaryotic sequences. The physical basis of the sequence conservation is stacked base triples formed between consecutive base-pairs on the stem or stem-like segment with bases (often adenines) from the loop-side segment. These triples have previously been characterized as Type I and Type II A-minor motifs and are stabilized by base-base and base-ribose hydrogen bonds. The sequence and structure conservation of ribose zippers can be directly used in tertiary structure prediction and may have applications in molecular modeling and design.  相似文献   

16.
Allosteric interactions between residues that are spatially apart and well separated in sequence are important in the function of multimeric proteins as well as single-domain proteins. This observation suggests that, among the residues that are involved in long-range communications, mutation at one site should affect interactions at a distant site. By adopting a sequence-based approach, we present an automated approach that uses a generalization of the familiar sequence entropy in conjunction with a coupled two-way clustering algorithm, to predict the network of interactions that trigger allosteric interactions in proteins. We use the method to identify the subset of dynamically important residues in three families, namely, the small PDZ family, G protein-coupled receptors (GPCR), and the Lectins, which are cell-adhesion receptors that mediate the tethering and rolling of leukocytes on inflamed endothelium. For the PDZ and GPCR families, our procedure predicts, in agreement with previous studies, a network containing a small number of residues that are involved in their function. Application to the Lectin family reveals a network of residues interspersed throughout the C-terminal end of the structure that are responsible for binding to ligands. Based on our results and previous studies, we propose that functional robustness requires that only a small subset of distantly connected residues be involved in transmitting allosteric signals in proteins.  相似文献   

17.
Identifying protein–protein interfaces is crucial for structural biology. Because of the constraints in wet experiments, many computational methods have been proposed. Without knowing any information about the partner chains, a new method of predicting protein–protein interaction interface residues purely based on evolutionary information in heterocomplexes is proposed here. Unlike traditional approaches using multiple sequence alignment profiles to represent the conservation level for each residue, we make predictions based on the concept of residue conservation scores so that the dimension of the feature vector for each residue can be drastically reduced, at least 20 times less than conventional methods. Based on the representation approach, a simple linear discriminant function is used to make predictions, so the computational complexity of the whole prediction procedure can also be greatly decreased. By testing our approach on 69 heterocomplex chains, experimental results demonstrate the performance of our approach is indeed superior to current existing methods.  相似文献   

18.
Predicted protein-protein interaction sites from local sequence information   总被引:2,自引:0,他引:2  
Ofran Y  Rost B 《FEBS letters》2003,544(1-3):236-239
Protein-protein interactions are facilitated by a myriad of residue-residue contacts on the interacting proteins. Identifying the site of interaction in the protein is a key for deciphering its functional mechanisms, and is crucial for drug development. Many studies indicate that the compositions of contacting residues are unique. Here, we describe a neural network that identifies protein-protein interfaces from sequence. For the most strongly predicted sites (in 34 of 333 proteins), 94% of the predictions were confirmed experimentally. When 70% of our predictions were right, we correctly predicted at least one interaction site in 20% of the complexes (66/333). These results indicate that the prediction of some interaction sites from sequence alone is possible. Incorporating evolutionary and predicted structural information may improve our method. However, even at this early stage, our tool might already assist wet-lab biology.  相似文献   

19.
MOTIVATION: Interacting pairs of proteins should co-evolve to maintain functional and structural complementarity. Consequently, such a pair of protein families shows similarity between their phylogenetic trees. Although the tendency of co-evolution has been known for various ligand-receptor pairs, it has not been studied systematically in the widest possible scope. We investigated the degree of co-evolution for more than 900 family pairs in a global protein structural interactome map (PSIMAP--a map of all the structural domain-domain interactions in the PDB). RESULTS: There was significant correlation in 45% of the total SCOPs Family level pairs, rising to 78% in 454 reliable family interactions. Expectedly, the intra-molecular interactions between protein families showed stronger co-evolution than inter-molecular interactions. However, both types of interaction have a fundamentally similar pattern of co-evolution except for cases where different interfaces are involved. These results validate the use of co-evolution analysis with predictive methods such as PSIMAP to improve the accuracy of prediction based on "homologous interaction". The tendency of co-evolution enabled a nearly 5-fold enrichment in the identification of true interactions among the potential interlogues in PSIMAP. The estimated sensitivity was 79.2%, and the specificity was 78.6%. AVAILABILITY: The results of co-evolution analysis are available online at http://www.biointeraction.org  相似文献   

20.
Accurate identification of strand residues aids prediction and analysis of numerous structural and functional aspects of proteins. We propose a sequence-based predictor, BETArPRED, which improves prediction of strand residues and β-strand segments. BETArPRED uses a novel design that accepts strand residues predicted by SSpro and predicts the remaining positions utilizing a logistic regression classifier with nine custom-designed features. These are derived from the primary sequence, the secondary structure (SS) predicted by SSpro, PSIPRED and SPINE, and residue depth as predicted by RDpred. Our features utilize certain local (window-based) patterns in the predicted SS and combine information about the predicted SS and residue depth. BETArPRED is evaluated on 432 sequences that share low identity with the training chains, and on the CASP8 dataset. We compare BETArPRED with seven modern SS predictors, and the top-performing automated structure predictor in CASP8, the ZHANG-server. BETArPRED provides statistically significant improvements over each of the SS predictors; it improves prediction of strand residues and β-strands, and it finds β-strands that were missed by the other methods. When compared with the ZHANG-server, we improve predictions of strand segments and predict more actual strand residues, while the other predictor achieves higher rate of correct strand residue predictions when under-predicting them.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号