首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
With the development of bioinformatics, more and more protein sequence information has become available. Meanwhile, the number of known protein–protein interactions (PPIs) is still very limited. In this article, we propose a new method for predicting interacting protein pairs using a Bayesian method based on a new feature representation. We trained our model using data on 6,459 PPI pairs from the yeast Saccharomyces cerevisiae core subset. Using six species of DIP database, our model demonstrates an average prediction accuracy of 93.67%. The result showed that our method is superior to other methods in both computing time and prediction accuracy.  相似文献   

2.
3.
Yi N  Shriner D  Banerjee S  Mehta T  Pomp D  Yandell BS 《Genetics》2007,176(3):1865-1877
We extend our Bayesian model selection framework for mapping epistatic QTL in experimental crosses to include environmental effects and gene-environment interactions. We propose a new, fast Markov chain Monte Carlo algorithm to explore the posterior distribution of unknowns. In addition, we take advantage of any prior knowledge about genetic architecture to increase posterior probability on more probable models. These enhancements have significant computational advantages in models with many effects. We illustrate the proposed method by detecting new epistatic and gene-sex interactions for obesity-related traits in two real data sets of mice. Our method has been implemented in the freely available package R/qtlbim (http://www.qtlbim.org) to facilitate the general usage of the Bayesian methodology for genomewide interacting QTL analysis.  相似文献   

4.
In this paper we address the problem of extracting features relevant for predicting protein--protein interaction sites from the three-dimensional structures of protein complexes. Our approach is based on information about evolutionary conservation and surface disposition. We implement a neural network based system, which uses a cross validation procedure and allows the correct detection of 73% of the residues involved in protein interactions in a selected database comprising 226 heterodimers. Our analysis confirms that the chemico-physical properties of interacting surfaces are difficult to distinguish from those of the whole protein surface. However neural networks trained with a reduced representation of the interacting patch and sequence profile are sufficient to generalize over the different features of the contact patches and to predict whether a residue in the protein surface is or is not in contact. By using a blind test, we report the prediction of the surface interacting sites of three structural components of the Dnak molecular chaperone system, and find close agreement with previously published experimental results. We propose that the predictor can significantly complement results from structural and functional proteomics.  相似文献   

5.
Short motifs are known to play diverse roles in proteins, such as in mediating the interactions with other molecules, binding to membranes, or conducting a specific biological function. Standard approaches currently employed to detect short motifs in proteins search for enrichment of amino acid motifs considering mostly the sequence information. Here, we presented a new approach to search for common motifs (protein signatures) which share both physicochemical and structural properties, looking simultaneously at different features. Our method takes as an input an amino acid sequence and translates it to a new alphabet that reflects its intrinsic structural and chemical properties. Using the MEME search algorithm, we identified the proteins signatures within subsets of protein which encompass common sequence and structural information. We demonstrated that we can detect enriched structural motifs, such as the amphipathic helix, from large datasets of linear sequences, as well as predicting common structural properties (such as disorder, surface accessibility, or secondary structures) of known functional‐motifs. Finally, we applied the method to the yeast protein interactome and identified novel putative interacting motifs. We propose that our approach can be applied for de novo protein function prediction given either sequence or structural information. Proteins 2013; © 2012 Wiley Periodicals, Inc.  相似文献   

6.
It is well known that stop codons play a critical role in the process of protein synthesis. However, little effort has been made to investigate whether stop codon usage exhibits biases, such as widely seen for synonymous codon usage. Here we systematically investigate stop codon usage bias in various eukaryotes as well as its relationships with its context, GC3 content, gene expression level, and secondary structure. The results show that there is a strong bias for stop codon usage in different eukaryotes, i.e., UAA is overrepresented in the lower eukaryotes, UGA is overrepresented in the higher eukaryotes, and UAG is least used in all eukaryotes. Different conserved patterns for each stop codon in different eukaryotic classes are found based on information content and logo analysis. GC3 contents increase with increasing complexity of organisms. Secondary structure prediction revealed that UAA is generally associated with loop structures, whereas UGA is more uniformly present in loop and stem structures, i.e., UGA is less biased toward having a particular structure. The stop codon usage bias, however, shows no significant relationship with GC3 content and gene expression level in individual eukaryotes. The results indicate that genomic complexity and GC3 content might contribute to stop codon usage bias in different eukaryotes. Our results indicate that stop codons, like synonymous codons, exhibit biases in usage. Additional work will be needed to understand the causes of these biases and their relationship to the mechanism of protein termination. [Reviewing Editor: Dr. Manyuan Long]  相似文献   

7.
MOTIVATION: Unravelling the rules underlying protein-protein and protein-ligand interactions is a crucial step in understanding cell machinery. Peptide recognition modules (PRMs) are globular protein domains which focus their binding targets on short protein sequences and play a key role in the frame of protein-protein interactions. High-throughput techniques permit the whole proteome scanning of each domain, but they are characterized by a high incidence of false positives. In this context, there is a pressing need for the development of in silico experiments to validate experimental results and of computational tools for the inference of domain-peptide interactions. RESULTS: We focused on the SH3 domain family and developed a machine-learning approach for inferring interaction specificity. SH3 domains are well-studied PRMs which typically bind proline-rich short sequences characterized by the PxxP consensus. The binding information is known to be held in the conformation of the domain surface and in the short sequence of the peptide. Our method relies on interaction data from high-throughput techniques and benefits from the integration of sequence and structure data of the interacting partners. Here, we propose a novel encoding technique aimed at representing binding information on the basis of the domain-peptide contact residues in complexes of known structure. Remarkably, the new encoding requires few variables to represent an interaction, thus avoiding the 'curse of dimension'. Our results display an accuracy >90% in detecting new binders of known SH3 domains, thus outperforming neural models on standard binary encodings, profile methods and recent statistical predictors. The method, moreover, shows a generalization capability, inferring specificity of unknown SH3 domains displaying some degree of similarity with the known data.  相似文献   

8.
9.
Estimating space-use and habitat preference from wildlife telemetry data   总被引:2,自引:0,他引:2  
Management and conservation of populations of animals requires information on where they are, why they are there, and where else they could be. These objectives are typically approached by collecting data on the animals' use of space, relating these positional data to prevailing environmental conditions and employing the resulting statistical models to predict usage at other geographical regions. Technical advances in wildlife telemetry have accomplished manifold increases in the amount and quality of available data, creating the need for a statistical framework that can use them to make population‐level inferences for habitat preference and space‐use. This has been slow‐in‐coming because wildlife telemetry data are spatio‐temporally autocorrelated, often unbalanced, presence‐only observations of behaviourally complex animals, responding to a multitude of cross‐correlated environmental variables. We review the evolution of regression models for the analysis of space‐use and habitat preference and outline the essential features of a framework that emerges naturally from these foundations. This allows us to derive a relationship between usage of points in geographical space and preference of habitats in environmental space. Within this framework, we discuss eight challenges, inherent in the spatial analysis of telemetry data and, for each, we propose solutions that can work in tandem. Specifically, we propose a logistic, mixed‐effects approach that uses generalized additive transformations of the environmental covariates and is fitted to a response data‐set comprising the telemetry and simulated observations, under a case‐control design. We apply this framework to a non‐trivial case‐study using satellite‐tagged grey seals Halichoerus grypus from the east coast of Scotland. We perform model selection by cross‐validation and confront our final model's predictions with telemetry data from the same, as well as different, geographical regions. We conclude that, despite the complex behaviour of the study species, flexible empirical models can capture the environmental relationships that shape population distributions.  相似文献   

10.
Protein-protein interactions are critical to most biological processes, and locating protein-protein interfaces on protein structures is an important task in molecular biology. We developed a new experimental strategy called the ‘absence of interference’ approach to determine surface residues involved in protein-protein interaction of established yeast two-hybrid pairs of interacting proteins. One of the proteins is subjected to high-level randomization by error-prone PCR. The resulting library is selected by yeast two-hybrid system for interacting clones that are isolated and sequenced. The interaction region can be identified by an absence or depletion of mutations. For data analysis and presentation, we developed a Web interface that analyzes the mutational spectrum and displays the mutational frequency on the surface of the structure (or a structural model) of the randomized protein†. Additionally, this interface might be of use for the display of mutational distributions determined by other types of random mutagenesis experiments. We applied the approach to map the interface of the catalytic domain of the DNA methyltransferase Dnmt3a with its regulatory factor Dnmt3L. Dnmt3a was randomized with high mutational load. A total of 76 interacting clones were isolated and sequenced, and 648 mutations were identified. The mutational pattern allowed to identify a unique interaction region on the surface of Dnmt3a, which comprises about 500-600 Å2. The results were confirmed by site-directed mutagenesis and structural analysis. The absence-of-interference approach will allow high-throughput mapping of protein interaction sites suitable for functional studies and protein docking.  相似文献   

11.
MOTIVATION: Protein-protein docking algorithms typically generate large numbers of possible complex structures with only a few of them resembling the native structure. Recently (Duan et al., Protein Sci, 14:316-218, 2005), it was observed that the surface density of conserved residue positions is high at the interface regions of interacting protein surfaces, except for antibody-antigen complexes, where a lesser number of conserved positions than average is observed at the interface regions. Using this observation, we identified putative interacting regions on the surface of interacting partners and significantly improved docking results by assigning top ranks to near-native complex structures. In this paper, we combine the residue conservation information with a widely used shape complementarity algorithm to generate candidate complex structures with a higher percentage of near-native structures (hits). What is new in this work is that the conservation information is used early in the generation stage and not only in the ranking stage of the docking algorithm. This results in a significantly larger number of generated hits and an improved predictive ability in identifying the native structure of protein-protein complexes. RESULTS: We report on results from 48 well-characterized protein complexes, which have enough residue conservation information from the same 59 benchmark complexes used in our previous work. We compute conservation indices of residue positions on the surfaces of interacting proteins using available homologous sequences from UNIPROT and calculate the solvent accessible surface area. We combine this information with shape-complementarity scores to generate candidate protein-protein complex structures. When compared with pure shape-complementarity algorithms, performed by FTDock, our method results in significantly more hits, with the improvement being over 100% in many instances. We demonstrate that residue conservation information is useful not only in refinement and scoring of docking solutions, but also helpful in enrichment of near-native-structures during the generation of candidate geometries of complex structures.  相似文献   

12.
Zhou Y  Zhou YS  He F  Song J  Zhang Z 《Molecular bioSystems》2012,8(5):1396-1404
Deciphering functional interactions between proteins is one of the great challenges in biology. Sequence-based homology-free encoding schemes have been increasingly applied to develop promising protein-protein interaction (PPI) predictors by means of statistical or machine learning methods. Here we analyze the relationship between codon pair usage and PPIs in yeast. We show that codon pair usage of interacting protein pairs differs significantly from randomly expected. This motivates the development of a novel approach for predicting PPIs, with codon pair frequency difference as input to a Support Vector Machine predictor, termed as CCPPI. 10-fold cross-validation tests based on yeast PPI datasets with balanced positive-to-negative ratios indicate that CCPPI performs better than other sequence-based encoding schemes. Moreover, it ranks the best when tested on an unbalanced large-scale dataset. Although CCPPI is subjected to high false positive rates like many PPI predictors, statistical analyses of the predicted true positives confirm that the success of CCPPI is partly ascribed to its capability to capture proteomic co-expression and functional similarities between interacting protein pairs. Our findings suggest that codon pairs of interacting protein pairs evolve in a coordinated manner and consequently they provide additional information beyond amino acids-based encoding schemes. CCPPI has been made freely available at: http://protein.cau.edu.cn/ccppi.  相似文献   

13.
Predicting protein functions with message passing algorithms   总被引:2,自引:0,他引:2  
MOTIVATION: In the last few years, a growing interest in biology has been shifting toward the problem of optimal information extraction from the huge amount of data generated via large-scale and high-throughput techniques. One of the most relevant issues has recently emerged that of correctly and reliably predicting the functions of a given protein with that of functions exploiting information coming from the whole network of proteins physically interacting with the functionally undetermined one. In the present work, we will refer to an 'observed' protein as the one present in the protein-protein interaction networks published in the literature. METHODS: The method proposed in this paper is based on a message passing algorithm known as Belief Propagation, which accepts the network of protein's physical interactions and a catalog of known protein's functions as input, and returns the probabilities for each unclassified protein of having one chosen function. The implementation of the algorithm allows for fast online analysis, and can easily be generalized into more complex graph topologies taking into account hypergraphs, i.e. complexes of more than two interacting proteins. RESULTS: Benchmarks of our method are the two Saccharomyces cerevisiae protein-protein interaction networks and the Database of Interacting Proteins. The validity of our approach is successfully tested against other available techniques. CONTACT: leone@isiosf.isi.it SUPPLEMENTARY INFORMATION: http://isiosf.isi.it/~pagnani  相似文献   

14.
MOTIVATION: Protein-protein interaction, mediated by protein interaction sites, is intrinsic to many functional processes in the cell. In this paper, we propose a novel method to discover patterns in protein interaction sites. We observed from protein interaction networks that there exist a kind of significant substructures called interacting protein group pairs, which exhibit an all-versus-all interaction between the two protein-sets in such a pair. The full-interaction between the pair indicates a common interaction mechanism shared by the proteins in the pair, which can be referred as an interaction type. Motif pairs at the interaction sites of the protein group pairs can be used to represent such interaction type, with each motif derived from the sequences of a protein group by standard motif discovery algorithms. The systematic discovery of all pairs of interacting protein groups from large protein interaction networks is a computationally challenging problem. By a careful and sophisticated problem transformation, the problem is solved using efficient algorithms for mining frequent patterns, a problem extensively studied in data mining. RESULTS: We found 5349 pairs of interacting protein groups from a yeast interaction dataset. The expected value of sequence identity within the groups is only 7.48%, indicating non-homology within these protein groups. We derived 5343 motif pairs from these group pairs, represented in the form of blocks. Comparing our motifs with domains in the BLOCKS and PRINTS databases, we found that our blocks could be mapped to an average of 3.08 correlated blocks in these two databases. The mapped blocks occur 4221 out of total 6794 domains (protein groups) in these two databases. Comparing our motif pairs with iPfam consisting of 3045 interacting domain pairs derived from PDB, we found 47 matches occurring in 105 distinct PDB complexes. Comparing with another putative domain interaction database InterDom, we found 203 matches. AVAILABILITY: http://research.i2r.a-star.edu.sg/BindingMotifPairs/resources. SUPPLEMENTARY INFORMATION: http://research.i2r.a-star.edu.sg/BindingMotifPairs and Bioinformatics online.  相似文献   

15.
Recent epigenomic studies have predicted thousands of potential enhancers in the human genome. However, there has not been systematic characterization of target promoters for these potential enhancers. Using H3K4me2 as a mark for active enhancers, we identified genome-wide EP interactions in human CD4+ T cells. Among the 6 520 long-distance chromatin interactions, we identify 2 067 enhancers that interact with 1 619 promoters and enhance their expression. These enhancers exist in accessible chromatin regions and are associated with various histone modifications and polymerase II binding. The promoters with interacting enhancers are expressed at higher levels than those without interacting enhancers, and their expression levels are positively correlated with the number of interacting enhancers. Interestingly, interacting promoters are co-expressed in a tissue-specific manner. We also find that chromosomes are organized into multiple levels of interacting domains. Our results define a global view of EP interactions and provide a data set to further understand mechanisms of enhancer targeting and long-range chromatin organization. The Gene Expression Omnibus accession number for the raw and analyzed chromatin interaction data is GSE32677.  相似文献   

16.
17.
Proteins rarely function in isolation but they form part of complex networks of interactions with other proteins within or among cells. The importance of a particular protein for cell viability is directly dependent upon the number of interactions where it participates and the function it performs: the larger the number of interactions of a protein the greater its functional importance is for the cell. With the advent of genome sequencing and "omics" technologies it became feasible conducting large-scale searches for protein interacting partners. Unfortunately, the accuracy of such analyses has been underwhelming owing to methodological limitations and to the inherent complexity of protein interactions. In addition to these experimental approaches, many computational methods have been developed to identify protein-protein interactions by assuming that interacting proteins coevolve resulting from the coadaptation dynamics between the amino acids of their interacting faces. We review the main technological advances made in the field of interactomics and discuss the feasibility of computational methods to identify protein-protein interactions based on the estimation of coevolution. As proof-of-concept, we present a classical case study: the interactions of cell surface proteins (receptors) and their ligands. Finally, we take this discussion one step forward to include interactions between organisms and species to understand the generation of biological complexity. Development of technologies for accurate detection of protein-protein interactions may shed light on processes that go from the fine-tuning of pathways and metabolic networks to the emergence of biological complexity.  相似文献   

18.
19.
Biological networks, such as genetic regulatory networks and protein interaction networks, provide important information for studying gene/protein activities. In this paper, we propose a new method, NetBoosting, for incorporating a priori biological network information in analyzing high dimensional genomics data. Specially, we are interested in constructing prediction models for disease phenotypes of interest based on genomics data, and at the same time identifying disease susceptible genes. We employ the gradient descent boosting procedure to build an additive tree model and propose a new algorithm to utilize the network structure in fitting small tree weak learners. We illustrate by simulation studies and a real data example that, by making use of the network information, NetBoosting outperforms a few existing methods in terms of accuracy of prediction and variable selection.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号