首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
2.
MOTIVATION:The development of experimental methods for genome scale analysis of molecular interaction networks has made possible new approaches to inferring protein function. This paper describes a method of assigning functions based on a probabilistic analysis of graph neighborhoods in a protein-protein interaction network. The method exploits the fact that graph neighbors are more likely to share functions than nodes which are not neighbors. A binomial model of local neighbor function labeling probability is combined with a Markov random field propagation algorithm to assign function probabilities for proteins in the network. RESULTS: We applied the method to a protein-protein interaction dataset for the yeast Saccharomyces cerevisiae using the Gene Ontology (GO) terms as function labels. The method reconstructed known GO term assignments with high precision, and produced putative GO assignments to 320 proteins that currently lack GO annotation, which represents about 10% of the unlabeled proteins in S. cerevisiae.  相似文献   

3.
MOTIVATION: The task of engineering a protein to perform a target biological function is known as protein design. A commonly used paradigm casts this functional design problem as a structural one, assuming a fixed backbone. In probabilistic protein design, positional amino acid probabilities are used to create a random library of sequences to be simultaneously screened for biological activity. Clearly, certain choices of probability distributions will be more successful in yielding functional sequences. However, since the number of sequences is exponential in protein length, computational optimization of the distribution is difficult. RESULTS: In this paper, we develop a computational framework for probabilistic protein design following the structural paradigm. We formulate the distribution of sequences for a structure using the Boltzmann distribution over their free energies. The corresponding probabilistic graphical model is constructed, and we apply belief propagation (BP) to calculate marginal amino acid probabilities. We test this method on a large structural dataset and demonstrate the superiority of BP over previous methods. Nevertheless, since the results obtained by BP are far from optimal, we thoroughly assess the paradigm using high-quality experimental data. We demonstrate that, for small scale sub-problems, BP attains identical results to those produced by exact inference on the paradigmatic model. However, quantitative analysis shows that the distributions predicted significantly differ from the experimental data. These findings, along with the excellent performance we observed using BP on the smaller problems, suggest potential shortcomings of the paradigm. We conclude with a discussion of how it may be improved in the future.  相似文献   

4.
A fast method to predict protein interaction sites from sequences   总被引:15,自引:0,他引:15  
A simple method for predicting residues involved in protein interaction sites is proposed. In the absence of any structural report, the procedure identifies linear stretches of sequences as "receptor-binding domains" (RBDs) by analysing hydrophobicity distribution. The sequences of two databases of non-homologous interaction sites eliciting various biological activities were tested; 59-80 % were detected as RBDs. A statistical analysis of amino acid frequencies was carried out in known interaction sites and in predicted RBDs. RBDs were predicted from the 80,000 sequences of the Swissprot database. In both cases, arginine is the most frequently occurring residue. The RBD procedure can also detect residues involved in specific interaction sites such as the DNA-binding (95 % detected) and Ca-binding domains (83 % detected). We report two recent analyses; from the prediction of RBDs in sequences to the experimental demonstration of the functional activities. The examples concern a retroviral Gag protein and a penicillin-binding protein. We support that this method is a quick way to predict protein interaction sites from sequences and is helpful for guiding experiments such as site-specific mutageneses, two-hybrid systems or the synthesis of inhibitors.  相似文献   

5.

Background  

Determining the function of uncharacterized proteins is a major challenge in the post-genomic era due to the problem's complexity and scale. Identifying a protein's function contributes to an understanding of its role in the involved pathways, its suitability as a drug target, and its potential for protein modifications. Several graph-theoretic approaches predict unidentified functions of proteins by using the functional annotations of better-characterized proteins in protein-protein interaction networks. We systematically consider the use of literature co-occurrence data, introduce a new method for quantifying the reliability of co-occurrence and test how performance differs across species. We also quantify changes in performance as the prediction algorithms annotate with increased specificity.  相似文献   

6.

Background

In recent years, the integration of ‘omics’ technologies, high performance computation, and mathematical modeling of biological processes marks that the systems biology has started to fundamentally impact the way of approaching drug discovery. The LINCS public data warehouse provides detailed information about cell responses with various genetic and environmental stressors. It can be greatly helpful in developing new drugs and therapeutics, as well as improving the situations of lacking effective drugs, drug resistance and relapse in cancer therapies, etc.

Results

In this study, we developed a Ternary status based Integer Linear Programming (TILP) method to infer cell-specific signaling pathway network and predict compounds’ treatment efficacy. The novelty of our study is that phosphor-proteomic data and prior knowledge are combined for modeling and optimizing the signaling network. To test the power of our approach, a generic pathway network was constructed for a human breast cancer cell line MCF7; and the TILP model was used to infer MCF7-specific pathways with a set of phosphor-proteomic data collected from ten representative small molecule chemical compounds (most of them were studied in breast cancer treatment). Cross-validation indicated that the MCF7-specific pathway network inferred by TILP were reliable predicting a compound’s efficacy. Finally, we applied TILP to re-optimize the inferred cell-specific pathways and predict the outcomes of five small compounds (carmustine, doxorubicin, GW-8510, daunorubicin, and verapamil), which were rarely used in clinic for breast cancer. In the simulation, the proposed approach facilitates us to identify a compound’s treatment efficacy qualitatively and quantitatively, and the cross validation analysis indicated good accuracy in predicting effects of five compounds.

Conclusions

In summary, the TILP model is useful for discovering new drugs for clinic use, and also elucidating the potential mechanisms of a compound to targets.
  相似文献   

7.
8.
The Human Proteome Organization's Proteomics Standards Initiative (PSI) promotes the development of exchange standards to improve data integration and interoperability. PSI specifies the suitable level of detail required when reporting a proteomics experiment (via the Minimum Information About a Proteomics Experiment), and provides extensible markup language (XML) exchange formats and dedicated controlled vocabularies (CVs) that must be combined to generate a standard compliant document. The framework presented here tackles the issue of checking that experimental data reported using a specific format, CVs and public bio‐ontologies (e.g. Gene Ontology, NCBI taxonomy) are compliant with the Minimum Information About a Proteomics Experiment recommendations. The semantic validator not only checks the XML syntax but it also enforces rules regarding the use of an ontology class or CV terms by checking that the terms exist in the resource and that they are used in the correct location of a document. Moreover, this framework is extremely fast, even on sizable data files, and flexible, as it can be adapted to any standard by customizing the parameters it requires: an XML Schema Definition, one or more CVs or ontologies, and a mapping file describing in a formal way how the semantic resources and the format are interrelated. As such, the validator provides a general solution to the common problem in data exchange: how to validate the correct usage of a data standard beyond simple XML Schema Definition validation. The framework source code and its various applications can be found at http://psidev.info/validator .  相似文献   

9.
Wang C  Marshall A  Zhang D  Wilson ZA 《Plant physiology》2012,158(4):1523-1533
Protein interactions are fundamental to the molecular processes occurring within an organism and can be utilized in network biology to help organize, simplify, and understand biological complexity. Currently, there are more than 10 publicly available Arabidopsis (Arabidopsis thaliana) protein interaction databases. However, there are limitations with these databases, including different types of interaction evidence, a lack of defined standards for protein identifiers, differing levels of information, and, critically, a lack of integration between them. In this paper, we present an interactive bioinformatics Web tool, ANAP (Arabidopsis Network Analysis Pipeline), which serves to effectively integrate the different data sets and maximize access to available data. ANAP has been developed for Arabidopsis protein interaction integration and network-based study to facilitate functional protein network analysis. ANAP integrates 11 Arabidopsis protein interaction databases, comprising 201,699 unique protein interaction pairs, 15,208 identifiers (including 11,931 The Arabidopsis Information Resource Arabidopsis Genome Initiative codes), 89 interaction detection methods, 73 species that interact with Arabidopsis, and 6,161 references. ANAP can be used as a knowledge base for constructing protein interaction networks based on user input and supports both direct and indirect interaction analysis. It has an intuitive graphical interface allowing easy network visualization and provides extensive detailed evidence for each interaction. In addition, ANAP displays the gene and protein annotation in the generated interactive network with links to The Arabidopsis Information Resource, the AtGenExpress Visualization Tool, the Arabidopsis 1,001 Genomes GBrowse, the Protein Knowledgebase, the Kyoto Encyclopedia of Genes and Genomes, and the Ensembl Genome Browser to significantly aid functional network analysis. The tool is available open access at http://gmdd.shgmo.org/Computational-Biology/ANAP.  相似文献   

10.
Prediction of protein function using protein-protein interaction data.   总被引:8,自引:0,他引:8  
Assigning functions to novel proteins is one of the most important problems in the postgenomic era. Several approaches have been applied to this problem, including the analysis of gene expression patterns, phylogenetic profiles, protein fusions, and protein-protein interactions. In this paper, we develop a novel approach that employs the theory of Markov random fields to infer a protein's functions using protein-protein interaction data and the functional annotations of protein's interaction partners. For each function of interest and protein, we predict the probability that the protein has such function using Bayesian approaches. Unlike other available approaches for protein annotation in which a protein has or does not have a function of interest, we give a probability for having the function. This probability indicates how confident we are about the prediction. We employ our method to predict protein functions based on "biochemical function," "subcellular location," and "cellular role" for yeast proteins defined in the Yeast Proteome Database (YPD, www.incyte.com), using the protein-protein interaction data from the Munich Information Center for Protein Sequences (MIPS, mips.gsf.de). We show that our approach outperforms other available methods for function prediction based on protein interaction data. The supplementary data is available at www-hto.usc.edu/~msms/ProteinFunction.  相似文献   

11.
Chen L  Wu LY  Wang Y  Zhang XS 《Proteins》2006,62(4):833-837
To elucidate protein interaction networks is one of the major goals of functional genomics for whole organisms. So far, various computational methods have been proposed for inference of protein-protein interactions. Based on the association method by Sprinzak et al., we propose an association probabilistic method in this short communication to infer protein interactions directly from the experimental data, which outperformed other existing methods in terms of both accuracy and efficiency despite its simple form. Specifically, we show that the association probabilistic method achieves the highest accuracy among the existing approaches for the measures of root-mean-square error and the Pearson correlation coefficient, and also runs much faster than the LP-based method, by experimental dataset in Yeast. Software is available from the authors upon request.  相似文献   

12.
13.
A metabolome pipeline: from concept to data to knowledge   总被引:5,自引:3,他引:5  
Metabolomics, like other omics methods, produces huge datasets of biological variables, often accompanied by the necessary metadata. However, regardless of the form in which these are produced they are merely the ground substance for assisting us in answering biological questions. In this short tutorial review and position paper we seek to set out some of the elements of “best practice” in the optimal acquisition of such data, and in the means by which they may be turned into reliable knowledge. Many of these steps involve the solution of what amount to combinatorial optimization problems, and methods developed for these, especially those based on evolutionary computing, are proving valuable. This is done in terms of a “pipeline” that goes from the design of good experiments, through instrumental optimization, data storage and manipulation, the chemometric data processing methods in common use, and the necessary means of validation and cross-validation for giving conclusions that are credible and likely to be robust when applied in comparable circumstances to samples not used in their generation.This revised version was published online in June 2005. The previous version did not contain colour images.  相似文献   

14.
Recent developments in research on the stability of proteins - specifically, comparisons of the ion pairs of homologous structures - show that ion pairs potentially contribute to the thermostability of proteins. This study proposes a probabilistic Bayesian statistical method to efficiently predict the thermostability of proteins based on the properties of ion pairs. The experimental results suggest that the numbers, types and bond lengths of ion pairs can be used to predict with high accuracy (up to 80%) the thermostability of functionally similar proteins. The predictions have high precision (99%), especially for hyperthermophilic proteins. Results for proteins with differing functions also indicate that the number of ion pairs is related to the thermostability of proteins, and that predictions of thermostability can also be made for proteins with different functions.  相似文献   

15.
Protein structure prediction by using bioinformatics can involve sequence similarity searches, multiple sequence alignments, identification and characterization of domains, secondary structure prediction, solvent accessibility prediction, automatic protein fold recognition, constructing three-dimensional models to atomic detail, and model validation. Not all protein structure prediction projects involve the use of all these techniques. A central part of a typical protein structure prediction is the identification of a suitable structural target from which to extrapolate three-dimensional information for a query sequence. The way in which this is done defines three types of projects. The first involves the use of standard and well-understood techniques. If a structural template remains elusive, a second approach using nontrivial methods is required. If a target fold cannot be reliably identified because inconsistent results have been obtained from nontrivial data analyses, the project falls into the third type of project and will be virtually impossible to complete with any degree of reliability. In this article, a set of protocols to predict protein structure from sequence is presented and distinctions among the three types of project are given. These methods, if used appropriately, can provide valuable indicators of protein structure and function.  相似文献   

16.
MOTIVATION: Most approaches in predicting protein function from protein-protein interaction data utilize the observation that a protein often share functions with proteins that interacts with it (its level-1 neighbours). However, proteins that interact with the same proteins (i.e. level-2 neighbours) may also have a greater likelihood of sharing similar physical or biochemical characteristics. We speculate that functional similarity between a protein and its neighbours from the two different levels arise from two distinct forms of functional association, and a protein is likely to share functions with its level-1 and/or level-2 neighbours. We are interested in finding out how significant is functional association between level-2 neighbours and how they can be exploited for protein function prediction. RESULTS: We made a statistical study on recent interaction data and observed that functional association between level-2 neighbours is clearly observable. A substantial number of proteins are observed to share functions with level-2 neighbours but not with level-1 neighbours. We develop an algorithm that predicts the functions of a protein in two steps: (1) assign a weight to each of its level-1 and level-2 neighbours by estimating its functional similarity with the protein using the local topology of the interaction network as well as the reliability of experimental sources and (2) scoring each function based on its weighted frequency in these neighbours. Using leave-one-out cross validation, we compare the performance of our method against that of several other existing approaches and show that our method performs relatively well.  相似文献   

17.

Background  

The development of high-throughput technologies such as yeast two-hybrid systems and mass spectrometry technologies has made it possible to generate large protein-protein interaction (PPI) datasets. Mining these datasets for underlying biological knowledge has, however, remained a challenge.  相似文献   

18.
Animal innovations range from the discovery of novel food types to the invention of completely novel behaviours. Innovations can give access to new opportunities, and thus enable innovating agents to invade and create novel niches. This in turn can pave the way for morphological adaptation and adaptive radiation. The mechanisms that make innovations possible are probably as diverse as the innovations themselves. So too are their evolutionary consequences. Perhaps because of this diversity, we lack a unifying framework that links mechanism to function. We propose a framework for animal innovation that describes the interactions between mechanism, fitness benefit and evolutionary significance, and which suggests an expanded range of experimental approaches. In doing so, we split innovation into factors (components and phases) that can be manipulated systematically, and which can be investigated both experimentally and with correlational studies. We apply this framework to a selection of cases, showing how it helps us ask more precise questions and design more revealing experiments.  相似文献   

19.
Gene expression studies generate large quantities of data with the defining characteristic that the number of genes (whose expression profiles are to be determined) exceed the number of available replicates by several orders of magnitude. Standard spot-by-spot analysis still seeks to extract useful information for each gene on the basis of the number of available replicates, and thus plays to the weakness of microarrays. On the other hand, because of the data volume, treating the entire data set as an ensemble, and developing theoretical distributions for these ensembles provides a framework that plays instead to the strength of microarrays. We present theoretical results that under reasonable assumptions, the distribution of microarray intensities follows the Gamma model, with the biological interpretations of the model parameters emerging naturally. We subsequently establish that for each microarray data set, the fractional intensities can be represented as a mixture of Beta densities, and develop a procedure for using these results to draw statistical inference regarding differential gene expression. We illustrate the results with experimental data from gene expression studies on Deinococcus radiodurans following DNA damage using cDNA microarrays.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号