期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Confirmation of human protein interaction data by human expression data

Andreas?Hahn J?rg?Rahnenführer Priti?Talwar Thomas?Lengauer Email author 《BMC bioinformatics》2005,6(1):112

Background

With microarray technology the expression of thousands of genes can be measured simultaneously. It is well known that the expression levels of genes of interacting proteins are correlated significantly more strongly in Saccharomyces cerevisiae than those of proteins that are not interacting. The objective of this work is to investigate whether this observation extends to the human genome. 相似文献

2.

Predicting protein structure classes from function predictions

Sommer I Rahnenführer J Domingues FS de Lichtenberg U Lengauer T 《Bioinformatics (Oxford, England)》2004,20(5):770-776

MOTIVATION: We introduce a new approach to using the information contained in sequence-to-function prediction data in order to recognize protein template classes, a critical step in predicting protein structure. The data on which our method is based comprise probabilities of functional categories; for given query sequences these probabilities are obtained by a neural net that has previously been trained on a variety of functionally important features. On a training set of sequences we assess the relevance of individual functional categories for identifying a given structural family. Using a combination of the most relevant categories, the likelihood of a query sequence to belong to a specific family can be estimated. RESULTS: The performance of the method is evaluated using cross-validation. For a fixed structural family and for every sequence, a score is calculated that measures the evidence for family membership. Even for structural families of small size, family members receive significantly higher scores. For some examples, we show that the relevant functional features identified by this method are biologically meaningful. The proposed approach can be used to improve existing sequence-to-structure prediction methods. AVAILABILITY: Matlab code is available on request from the authors. The data are available at http://www.mpisb.mpg.de/~sommer/Fun2Struc/ 相似文献

3.

Global protein function annotation through mining genome-scale data in yeast Saccharomyces cerevisiae 总被引：7，自引：0，他引：7

Chen Y Xu D 《Nucleic acids research》2004,32(21):6414-6424

As we are moving into the post genome-sequencing era, various high-throughput experimental techniques have been developed to characterize biological systems on the genomic scale. Discovering new biological knowledge from the high-throughput biological data is a major challenge to bioinformatics today. To address this challenge, we developed a Bayesian statistical method together with Boltzmann machine and simulated annealing for protein functional annotation in the yeast Saccharomyces cerevisiae through integrating various high-throughput biological data, including yeast two-hybrid data, protein complexes and microarray gene expression profiles. In our approach, we quantified the relationship between functional similarity and high-throughput data, and coded the relationship into ‘functional linkage graph’, where each node represents one protein and the weight of each edge is characterized by the Bayesian probability of function similarity between two proteins. We also integrated the evolution information and protein subcellular localization information into the prediction. Based on our method, 1802 out of 2280 unannotated proteins in yeast were assigned functions systematically. 相似文献

4.

Managing and mining protein crystallization data

Amin AA Faux NG Fenalti G Williams G Bernadou A Daglish B Keefe K Middleton S Rae J Tetis K Law RH Fulton KF Rossjohn J Whisstock JC Buckle AM 《Proteins》2006,62(1):4-7

The crystallization of macromolecules remains a major bottleneck in structural biology. The routine screening of more than one thousand crystallization conditions and subsequent optimization by fine screening presents a challenge to conventional laboratory notebook keeping. In addition, the development of high-throughput robotic crystallization and imaging systems presents a pressing need for low-cost laboratory information management system (LIMS). Here we describe CLIMS2, a crystallization LIMS that features a simple, user-friendly graphical interface, allowing the storage, management, retrieval and mining of crystallization data. The CLIMS2 executable and documentation is freely available at http://clims.med.monash.edu.au. 相似文献

5.

CAFA and the Open World of protein function predictions

Christophe Dessimoz Nives Škunca Paul D. Thomas 《Trends in genetics : TIG》2013

相似文献

6.

Novel function discovery through sequence and structural data mining

《Current opinion in structural biology》2016

相似文献

7.

ZRANK: reranking protein docking predictions with an optimized energy function

Pierce B Weng Z 《Proteins》2007,67(4):1078-1086

Protein-protein docking requires fast and effective methods to quickly discriminate correct from incorrect predictions generated by initial-stage docking. We have developed and tested a scoring function that utilizes detailed electrostatics, van der Waals, and desolvation to rescore initial-stage docking predictions. Weights for the scoring terms were optimized for a set of test cases, and this optimized function was then tested on an independent set of nonredundant cases. This program, named ZRANK, is shown to significantly improve the success rate over the initial ZDOCK rankings across a large benchmark. The amount of test cases with No. 1 ranked hits increased from 2 to 11 and from 6 to 12 when predictions from two ZDOCK versions were considered. ZRANK can be applied either as a refinement protocol in itself or as a preprocessing stage to enrich the well-ranked hits prior to further refinement. 相似文献

8.

Data mining the protein data bank: residue interactions

Oldfield TJ 《Proteins》2002,49(4):510-528

The protein databank contains a vast wealth of structural and functional information. The analysis of this macromolecular information has been the subject of considerable work in order to advance knowledge beyond the collection of molecular coordinates. This article presents a method that determines local structural information within proteins using mathematical data mining techniques. The mine program described returns many known configurations of residues such as the catalytic triad, metal binding sites and the N-linked glycosylation site; as well as many other multiple residue interactions not previously categorized. Because mathematical constructs are used as targets, this method can identify new information not previously known, and also provide unbiased results of typical structure and their expected deviations. Because the results are defined mathematically, they cannot indicate the biological implications of the results. Therefore two support programs are described that provide insight into the biological context for the mine results. The first allows a weighted RMSD search between a template set of coordinates and a list of PDB files, and the second allows the labeling of a protein with the template results from mining to aid in the classification of this protein. 相似文献

9.

根据基因表达谱筛选间接互作蛋白质进行蛋白质功能预测

高磊朱明珠郭政李霞《生物信息学》2006,4(3):105-108

利用基因表达谱数据,通过计算互作蛋白质的表达相关系数,来筛选、优化蛋白质互作网络。结果显示,利用经过筛选的互作数据,根据邻居计数法和卡方法进行功能预测的预测效果明显提高,距离待测蛋白质较远的邻居也包含着与待测蛋白质功能一致的信息。相似文献

10.

Accuracy of protein flexibility predictions

Mauno Vihinen Esa Torkkila Pentti Riikonen 《Proteins》1994,19(2):141-149

Protein structural flexibility is important for catalysis, binding, and allostery. Flexibility has been predicted from amino acid sequence with a sliding window averaging technique and applied primarily to epitope search. New prediction parameters were derived from 92 refined protein structures in an unbiased selection of the Protein Data Bank by developing further the method of Karplus and Schulz (Naturwissenschaften 72:212–213, 1985). The accuracy of four flexibility prediction techniques was studied by comparing atomic temperature factors of known three-dimensional protein structures to predictions by using correlation coefficients. The size of the prediction window was optimized for each method. Predictions made with our new parameters, using an optimized window size of 9 residues in the prediction window, were giving the best results. The difference from another previously used technique was small, whereas two other methods were much poorer. Applicability of the predictions was also tested by searching for known epitopes from amino acid sequences. The best techniques predicted correctly 20 of 31 continuous epitopes in seven proteins. Flexibility parameters have previously been used for calculating protein average flexibility indices which are inversely correlated to protein stability. Indices with the new parameters showed better correlation to protein stability than those used previously; furthermore they had relationship even when the old parameters failed. © 1994 Wiley-Liss, Inc. 相似文献

11.

Global protein interactome exploration through mining genome-scale data in Arabidopsis thaliana

Xu F Li G Zhao C Li Y Li P Cui J Deng Y Shi T 《BMC genomics》2010,11(Z2):S2

Background

Many essential cellular processes, such as cellular metabolism, transport, cellular metabolism and most regulatory mechanisms, rely on physical interactions between proteins. Genome-wide protein interactome networks of yeast, human and several other animal organisms have already been established, but this kind of network reminds to be established in the field of plant.

Results

We first predicted the protein protein interaction in Arabidopsis thaliana with methods, including ortholog, SSBP, gene fusion, gene neighbor, phylogenetic profile, coexpression, protein domain, and used Naïve Bayesian approach next to integrate the results of these methods and text mining data to build a genome-wide protein interactome network. Furthermore, we adopted the data of GO enrichment analysis, pathway, published literature to validate our network, the confirmation of our network shows the feasibility of using our network to predict protein function and other usage.

Conclusions

Our interactome is a comprehensive genome-wide network in the organism plant Arabidopsis thaliana, and provides a rich resource for researchers in related field to study the protein function, molecular interaction and potential mechanism under different conditions.

相似文献

12.

Genomewide predictions from maize single-cross data

Jon M. Massman Andres Gordillo Robenzon E. Lorenzana Rex Bernardo 《TAG. Theoretical and applied genetics. Theoretische und angewandte Genetik》2013,126(1):13-22

Maize (Zea mays L.) breeders evaluate many single-cross hybrids each year in multiple environments. Our objective was to determine the usefulness of genomewide predictions, based on marker effects from maize single-cross data, for identifying the best untested single crosses and the best inbreds within a biparental cross. We considered 479 experimental maize single crosses between 59 Iowa Stiff Stalk Synthetic (BSSS) inbreds and 44 non-BSSS inbreds. The single crosses were evaluated in multilocation experiments from 2001 to 2009 and the BSSS and non-BSSS inbreds had genotypic data for 669 single nucleotide polymorphism (SNP) markers. Single-cross performance was predicted by a previous best linear unbiased prediction (BLUP) approach that utilized marker-based relatedness and information on relatives, and from genomewide marker effects calculated by ridge-regression BLUP (RR-BLUP). With BLUP, the mean prediction accuracy (r _MG) of single-cross performance was 0.87 for grain yield, 0.90 for grain moisture, 0.69 for stalk lodging, and 0.84 for root lodging. The BLUP and RR-BLUP models did not lead to r _MG values that differed significantly. We then used the RR-BLUP model, developed from single-cross data, to predict the performance of testcrosses within 14 biparental populations. The r _MG values within each testcross population were generally low and were often negative. These results were obtained despite the above-average level of linkage disequilibrium, i.e., r ² between adjacent markers of 0.35 in the BSSS inbreds and 0.26 in the non-BSSS inbreds. Overall, our results suggested that genomewide marker effects estimated from maize single crosses are not advantageous (compared with BLUP) for predicting single-cross performance and have erratic usefulness for predicting testcross performance within a biparental cross. 相似文献

13.

Proteomics data mining

《Expert review of proteomics》2013,10(6):599-603

Marc Wilkins completed his undergraduate and doctoral studies at Macquarie University, Sydney, Australia. During his doctoral studies, he defined the concept of the proteome and coined the term. After postdoctoral studies in Geneva, Switzerland, during which he co-edited the first book on proteomics, he returned to Australia, where he cofounded the company Proteome Systems. More recently, Marc took a position as Professor of Systems Biology at the University of New South Wales. He has established and directs the NSW Systems Biology Initiative, and is currently researching the role that protein post-translational modifications play in the regulation of protein-interaction networks. 相似文献

14.

Consensus predictions of membrane protein topology

Nilsson J Persson B von Heijne G 《FEBS letters》2000,486(3):267-269

We have explored the possibility that consensus predictions of membrane protein topology might provide a means to estimate the reliability of a predicted topology. Using five current topology prediction methods and a test set of 60 Escherichia coli inner membrane proteins with experimentally determined topologies, we find that prediction performance varies strongly with the number of methods that agree, and that the topology of nearly half of all E. coli inner membrane proteins can be predicted with high reliability (>90% correct predictions) by a simple majority-vote approach. 相似文献

15.

基于物候特征的盐渍化信息数据挖掘研究 总被引：2，自引：0，他引：2

何宝忠丁建丽王飞张喆刘博华《生态学报》2017,37(9):3133-3148

盐渍化是影响植被和作物长势的重要因素,精确反演盐渍化的时空分布信息至关重要。基于MOD13A1-NDVI数据反演生长季开始日期(SOS)、生长季结束日期(EOS)、生长季长度(LEN)等物候参数和计算出能高精度反演盐渍化空间分布的多种植被指数、盐分指数、地形指数、干旱指数等参数后作为BP-ANN人工神经网络的输入因子来反演盐渍化信息,同时按照植被类型和地貌类型进行分区来反演盐渍化信息,以探讨盐渍化受植被和地貌类型的影响。主要结论如下:(1)盐渍化的形成受多种因素的影响,与物候参数大多呈非线性关系,不能单纯的以某拟合公式来进行表达,需要借助人工神经网络超强的非线性拟合能力来反演盐渍化信息。(2)通过深入挖掘植被物候信息,在融入物候参数后的反演精度显著提高。可决系数R2从0.68(非物候参数)增加到0.79(包括物候参数),但是需要加入地形、影像数据和土壤水分等方面的信息来更加精确的反演盐渍化信息。生物累积量指标LSI(Large seasonal integral)和SSI(Small seasonal integral)能够很好的表征盐渍化的信息。(3)划分植被类型后的盐渍化提取精度进一步提高,可决系数R~2达到了0.88。(4)以地貌特征作为类型分区后,反演结果的R~2达到了0.85,精度较高,比以植被类型作为分区的精度略小。高程较低区域的盐渍化现象普遍较重,盐渍化程度受到地形和地貌因素的影响显著。(5)农用地区域多为非盐渍化和轻度盐渍化地,稀疏植被区多为重盐渍化地。研究区的非盐渍化和轻盐渍化地、中盐渍化地和重度盐渍化地比例分别为53.42%,13.71%,32.87%。以上的研究结果提出了一种融合物候信息和非物候参数来反演盐渍化信息的方法,进行深入的协同植被物候监测盐渍化信息方面的数据挖掘,在融入了物候参数后,盐渍化的预测精度显著提高。相似文献

16.

Reliability of transmembrane predictions in whole-genome data

Käll L Sonnhammer EL 《FEBS letters》2002,532(3):415-418

Transmembrane prediction methods are generally benchmarked on a set of proteins with experimentally verified topology. We have investigated if the accuracy measured on such datasets can be expected in an unbiased genomic analysis, or if there is a bias towards 'easily predictable' proteins in the benchmark datasets. As a measurement of accuracy, the concordance of the results from five different prediction methods was used (TMHMM, PHD, HMMTOP, MEMSAT, and TOPPRED). The benchmark dataset showed significantly higher levels (up to five times) of agreement between different methods than in 10 tested genomes. We have also analyzed which programs are most prone to make mispredictions by measuring the frequency of one-out-of-five disagreeing predictions. 相似文献

17.

Probabilistic protein function prediction from heterogeneous genome-wide data

Nariai N Kolaczyk ED Kasif S 《PloS one》2007,2(3):e337

Dramatic improvements in high throughput sequencing technologies have led to a staggering growth in the number of predicted genes. However, a large fraction of these newly discovered genes do not have a functional assignment. Fortunately, a variety of novel high-throughput genome-wide functional screening technologies provide important clues that shed light on gene function. The integration of heterogeneous data to predict protein function has been shown to improve the accuracy of automated gene annotation systems. In this paper, we propose and evaluate a probabilistic approach for protein function prediction that integrates protein-protein interaction (PPI) data, gene expression data, protein motif information, mutant phenotype data, and protein localization data. First, functional linkage graphs are constructed from PPI data and gene expression data, in which an edge between nodes (proteins) represents evidence for functional similarity. The assumption here is that graph neighbors are more likely to share protein function, compared to proteins that are not neighbors. The functional linkage graph model is then used in concert with protein domain, mutant phenotype and protein localization data to produce a functional prediction. Our method is applied to the functional prediction of Saccharomyces cerevisiae genes, using Gene Ontology (GO) terms as the basis of our annotation. In a cross validation study we show that the integrated model increases recall by 18%, compared to using PPI data alone at the 50% precision. We also show that the integrated predictor is significantly better than each individual predictor. However, the observed improvement vs. PPI depends on both the new source of data and the functional category to be predicted. Surprisingly, in some contexts integration hurts overall prediction accuracy. Lastly, we provide a comprehensive assignment of putative GO terms to 463 proteins that currently have no assigned function. 相似文献

18.

Predicting protein function from sequence and structural data

Watson JD Laskowski RA Thornton JM 《Current opinion in structural biology》2005,15(3):275-284

When a protein's function cannot be experimentally determined, it can often be inferred from sequence similarity. Should this process fail, analysis of the protein structure can provide functional clues or confirm tentative functional assignments inferred from the sequence. Many structure-based approaches exist (e.g. fold similarity, three-dimensional templates), but as no single method can be expected to be successful in all cases, a more prudent approach involves combining multiple methods. Several automated servers that integrate evidence from multiple sources have been released this year and particular improvements have been seen with methods utilizing the Gene Ontology functional annotation schema. 相似文献

19.

Predicting protein function from protein/protein interaction data: a probabilistic approach

Letovsky S Kasif S 《Bioinformatics (Oxford, England)》2003,19(Z1):i197-i204

MOTIVATION:The development of experimental methods for genome scale analysis of molecular interaction networks has made possible new approaches to inferring protein function. This paper describes a method of assigning functions based on a probabilistic analysis of graph neighborhoods in a protein-protein interaction network. The method exploits the fact that graph neighbors are more likely to share functions than nodes which are not neighbors. A binomial model of local neighbor function labeling probability is combined with a Markov random field propagation algorithm to assign function probabilities for proteins in the network. RESULTS: We applied the method to a protein-protein interaction dataset for the yeast Saccharomyces cerevisiae using the Gene Ontology (GO) terms as function labels. The method reconstructed known GO term assignments with high precision, and produced putative GO assignments to 320 proteins that currently lack GO annotation, which represents about 10% of the unlabeled proteins in S. cerevisiae. 相似文献

20.

Prediction of protein function using protein-protein interaction data. 总被引：8，自引：0，他引：8

Minghua Deng Kui Zhang Shipra Mehta Ting Chen Fengzhu Sun 《Journal of computational biology》2003,10(6):947-960

Assigning functions to novel proteins is one of the most important problems in the postgenomic era. Several approaches have been applied to this problem, including the analysis of gene expression patterns, phylogenetic profiles, protein fusions, and protein-protein interactions. In this paper, we develop a novel approach that employs the theory of Markov random fields to infer a protein's functions using protein-protein interaction data and the functional annotations of protein's interaction partners. For each function of interest and protein, we predict the probability that the protein has such function using Bayesian approaches. Unlike other available approaches for protein annotation in which a protein has or does not have a function of interest, we give a probability for having the function. This probability indicates how confident we are about the prediction. We employ our method to predict protein functions based on "biochemical function," "subcellular location," and "cellular role" for yeast proteins defined in the Yeast Proteome Database (YPD, www.incyte.com), using the protein-protein interaction data from the Munich Information Center for Protein Sequences (MIPS, mips.gsf.de). We show that our approach outperforms other available methods for function prediction based on protein interaction data. The supplementary data is available at www-hto.usc.edu/~msms/ProteinFunction. 相似文献