期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A combined approach for genome wide protein function annotation/prediction

Alfredo Benso Stefano Di Carlo Hafeez ur Rehman Gianfranco Politano Alessandro Savino Prashanth Suravajhala 《Proteome science》2013,11(Z1):S1

Background

Today large scale genome sequencing technologies are uncovering an increasing amount of new genes and proteins, which remain uncharacterized. Experimental procedures for protein function prediction are low throughput by nature and thus can't be used to keep up with the rate at which new proteins are discovered. On the other hand, proteins are the prominent stakeholders in almost all biological processes, and therefore the need to precisely know their functions for a better understanding of the underlying biological mechanism is inevitable. The challenge of annotating uncharacterized proteins in functional genomics and biology in general motivates the use of computational techniques well orchestrated to accurately predict their functions.

Methods

We propose a computational flow for the functional annotation of a protein able to assign the most probable functions to a protein by aggregating heterogeneous information. Considered information include: protein motifs, protein sequence similarity, and protein homology data gathered from interacting proteins, combined with data from highly similar non-interacting proteins (hereinafter called Similactors). Moreover, to increase the predictive power of our model we also compute and integrate term specific relationships among functional terms based on Gene Ontology (GO).

Results

We tested our method on Saccharomyces Cerevisiae and Homo sapiens species proteins. The aggregation of different structural and functional evidence with GO relationships outperforms, in terms of precision and accuracy of prediction than the other methods reported in literature. The predicted precision and accuracy is 100% for more than half of the input set for both species; overall, we obtained 85.38% precision and 81.95% accuracy for Homo sapiens and 79.73% precision and 80.06% accuracy for Saccharomyces Cerevisiae species proteins.

相似文献

2.

Whole-proteome prediction of protein function via graph-theoretic analysis of interaction maps 总被引：2，自引：0，他引：2

Nabieva E Jim K Agarwal A Chazelle B Singh M 《Bioinformatics (Oxford, England)》2005,21(Z1):i302-i310

相似文献

3.

Genome wide identification and classification of alternative splicing based on EST data 总被引：1，自引：0，他引：1

Gupta S Zink D Korn B Vingron M Haas SA 《Bioinformatics (Oxford, England)》2004,20(16):2579-2585

MOTIVATION: Alternative splicing is currently seen to explain the vast disparity between the number of predicted genes in the human genome and the highly diverse proteome. The mapping of expressed sequences tag (EST) consensus sequences derived from the GeneNest database onto the genome provides an efficient way of predicting exon-intron boundaries, gene structure and alternative splicing events. However, the alternative splicing events are obscured by a large number of putatively artificial exon boundaries arising due to genomic contamination or alignment errors. The current work describes a methodology to associate quality values to the predicted exon-intron boundaries. High quality exon-intron boundaries are used to predict constitutive and alternative splicing ranked by confidence values, aiming to facilitate large-scale analysis of alternative splicing and splicing in general. RESULTS: Applying the current methodology, constitutive splicing is observed in 33,270 EST clusters, out of which 45% are alternatively spliced. The classification derived from the computed confidence values for 17 of these splice events frequently correlate (15/17) with RT-PCR experiments performed for 40 different tissue samples. As an application of the confidence measure, an evaluation of distribution of alternative splicing revealed that majority of variants correspond to the coding regions of the genes. However, still a significant fraction maps to non-coding regions, thereby indicating a functional relevance of alternative splicing in untranslated regions. AVAILABILITY: The predicted alternative splice variants are visualized in the SpliceNest database at http://splicenest.molgen.mpg.de 相似文献

4.

基于分组重量编码的蛋白质功能预测

王秀鹤王正华王勇献张振慧《生物信息学》2007,5(1):25-27

从蛋白质序列出发,采用分组重量编码(Encoding Based on Grouped Weight,简记EBGW),并结合最近邻居算法对蛋白质功能进行预测。对酵母(Saccharomyces cerevisiae)蛋白质的1826条序列进行预测,整体预测准确率与其他基于序列信息的蛋白质功能预测方法相当。实验结果表明基于EBGW编码方案的新方法可有效地应用于蛋白质功能预测。相似文献

5.

Genome wide discovery of genetic variants affecting alternative splicing patterns in human using bioinformatics method

Seonggyun Han Hyeim Jung Kichan Lee Hyunho Kim Sangsoo Kim 《Genes & genomics.》2017,39(4):453-459

相似文献

6.

An efficient strategy for extensive integration of diverse biological data for protein function prediction

Chua HN Sung WK Wong L 《Bioinformatics (Oxford, England)》2007,23(24):3364-3373

MOTIVATION: With the increasing availability of diverse biological information, protein function prediction approaches have converged towards integration of heterogeneous data. Many adapted existing techniques, such as machine-learning and probabilistic methods, which have proven successful on specific data types. However, the impact of these approaches is hindered by a couple of factors. First, there is little comparison between existing approaches. This is in part due to a divergence in the focus adopted by different works, which makes comparison difficult or even fuzzy. Second, there seems to be over-emphasis on the use of computationally demanding machine-learning methods, which runs counter to the surge in biological data. Analogous to the success of BLAST for sequence homology search, we believe that the ability to tap escalating quantity, quality and diversity of biological data is crucial to the success of automated function prediction as a useful instrument for the advancement of proteomic research. We address these problems by: (1) providing useful comparison between some prominent methods; (2) proposing Integrated Weighted Averaging (IWA)--a scalable, efficient and flexible function prediction framework that integrates diverse information using simple weighting strategies and a local prediction method. The simplicity of the approach makes it possible to make predictions based on on-the-fly information fusion. RESULTS: In addition to its greater efficiency, IWA performs exceptionally well against existing approaches. In the presence of cross-genome information, which is overwhelming for existing approaches, IWA makes even better predictions. We also demonstrate the significance of appropriate weighting strategies in data integration. 相似文献

7.

User guide for the discovery of potential drugs via protein structure prediction and ligand docking simulation

Shaker Bilal Yu Myung-Sang Lee Jingyu Lee Yongmin Jung Chanjin Na Dokyun 《Journal of microbiology (Seoul, Korea)》2020,58(3):235-244

Journal of Microbiology - Due to accumulating protein structure information and advances in computational methodologies, it has now become possible to predict protein-compound interactions. In... 相似文献

8.

BioOptimizer: a Bayesian scoring function approach to motif discovery 总被引：5，自引：0，他引：5

Jensen ST Liu JS 《Bioinformatics (Oxford, England)》2004,20(10):1557-1564

相似文献

9.

SWISS-PROT: connecting biomolecular knowledge via a protein database

Gasteiger E Jung E Bairoch A 《Current issues in molecular biology》2001,3(3):47-55

With the explosive growth of biological data, the development of new means of data storage was needed. More and more often biological information is no longer published in the conventional way via a publication in a scientific journal, but only deposited into a database. In the last two decades these databases have become essential tools for researchers in biological sciences. Biological databases can be classified according to the type of information they contain. There are basically three types of sequence-related databases (nucleic acid sequences, protein sequences and protein tertiary structures) as well as various specialized data collections. It is important to provide the users of biomolecular databases with a degree of integration between these databases as by nature all of these databases are connected in a scientific sense and each one of them is an important piece to biological complexity. In this review we will highlight our effort in connecting biological information as demonstrated in the SWISS-PROT protein database. 相似文献

10.

Genome mining approach for the discovery of novel cytochrome P450 biocatalysts

Toshiki Furuya Kuniki Kino 《Applied microbiology and biotechnology》2010,86(4):991-1002

Cytochrome P450 enzymes (P450s) are able to regioselectively and stereoselectively introduce oxygen into organic compounds under mild reaction conditions. These monooxygenases in particular easily catalyze the insertion of oxygen into less reactive carbon–hydrogen bonds. Hence, P450s are of considerable interest as oxidation biocatalysts. To date, although several P450s have been discovered through screening of microorganisms and have been further genetically engineered, the substrate range of these biocatalysts is still limited to fulfill the requirements for a large number of oxidation processes. On the other hand, the recent rapid expansion in the number of reported microbial genome sequences has revealed the presence of an unexpectedly vast number of P450 genes. This large pool of naturally evolved P450s has attracted much attention as a resource for new oxidation biocatalysts. In this review, we focus on aspects of the genome mining approach that are relevant for the discovery of novel P450 biocatalysts. This approach opens up possibilities for exploitation of the catalytic potential of P450s for the preparation of a large choice of oxidation biocatalysts with a variety of substrate specificities. 相似文献

11.

Protein function prediction: towards integration of similarity metrics 总被引：1，自引：0，他引：1

Erdin S Lisewski AM Lichtarge O 《Current opinion in structural biology》2011,21(2):180-188

Genomic centers discover increasingly many protein sequences and structures, but not necessarily their full biological functions. Thus, currently, less than one percent of proteins have experimentally verified biochemical activities. To fill this gap, function prediction algorithms apply metrics of similarity between proteins on the premise that those sufficiently alike in sequence, or structure, will perform identical functions. Although high sensitivity is elusive, network analyses that integrate these metrics together hold the promise of rapid gains in function prediction specificity. 相似文献

12.

Improved prediction of critical residues for protein function based on network and phylogenetic analyses

Boris?Thibert Dale?E?Bredesen Gabriel?del Rio Email author 《BMC bioinformatics》2005,6(1):213

Background

Phylogenetic approaches are commonly used to predict which amino acid residues are critical to the function of a given protein. However, such approaches display inherent limitations, such as the requirement for identification of multiple homologues of the protein under consideration. Therefore, complementary or alternative approaches for the prediction of critical residues would be desirable. Network analyses have been used in the modelling of many complex biological systems, but only very recently have they been used to predict critical residues from a protein's three-dimensional structure. Here we compare a couple of phylogenetic approaches to several different network-based methods for the prediction of critical residues, and show that a combination of one phylogenetic method and one network-based method is superior to other methods previously employed. 相似文献

13.

Assignment of protein function and discovery of novel nucleolar proteins based on automatic analysis of MEDLINE

Schuemie M Chichester C Lisacek F Coute Y Roes PJ Sanchez JC Kors J Mons B 《Proteomics》2007,7(6):921-931

Attribution of the most probable functions to proteins identified by proteomics is a significant challenge that requires extensive literature analysis. We have developed a system for automated prediction of implicit and explicit biologically meaningful functions for a proteomics study of the nucleolus. This approach uses a set of vocabulary terms to map and integrate the information from the entire MEDLINE database. Based on a combination of cross-species sequence homology searches and the corresponding literature, our approach facilitated the direct association between sequence data and information from biological texts describing function. Comparison of our automated functional assignment to manual annotation demonstrated our method to be highly effective. To establish the sensitivity, we defined the functional subtleties within a family containing a highly conserved sequence. Clustering of the DEAD-box protein family of RNA helicases confirmed that these proteins shared similar morphology although functional subfamilies were accurately identified by our approach. We visualized the nucleolar proteome in terms of protein functions using multi-dimensional scaling, showing functional associations between nucleolar proteins that were not previously realized. Finally, by clustering the functional properties of the established nucleolar proteins, we predicted novel nucleolar proteins. Subsequently, nonproteomics studies confirmed the predictions of previously unidentified nucleolar proteins. 相似文献

14.

JIGSAW: integration of multiple sources of evidence for gene prediction 总被引：3，自引：0，他引：3

Allen JE Salzberg SL 《Bioinformatics (Oxford, England)》2005,21(18):3596-3603

MOTIVATION: Computational gene finding systems play an important role in finding new human genes, although no systems are yet accurate enough to predict all or even most protein-coding regions perfectly. Ab initio programs can be augmented by evidence such as expression data or protein sequence homology, which improves their performance. The amount of such evidence continues to grow, but computational methods continue to have difficulty predicting genes when the evidence is conflicting or incomplete. Genome annotation pipelines collect a variety of types of evidence about gene structure and synthesize the results, which can then be refined further through manual, expert curation of gene models. RESULTS: JIGSAW is a new gene finding system designed to automate the process of predicting gene structure from multiple sources of evidence, with results that often match the performance of human curators. JIGSAW computes the relative weight of different lines of evidence using statistics generated from a training set, and then combines the evidence using dynamic programming. Our results show that JIGSAW's performance is superior to ab initio gene finding methods and to other pipelines such as Ensembl. Even without evidence from alignment to known genes, JIGSAW can substantially improve gene prediction accuracy as compared with existing methods. AVAILABILITY: JIGSAW is available as an open source software package at http://cbcb.umd.edu/software/jigsaw. 相似文献

15.

Enhanced protein fold recognition through a novel data integration approach

Yiming Ying Kaizhu Huang Colin Campbell 《BMC bioinformatics》2009,10(1):267-18

Background

Protein fold recognition is a key step in protein three-dimensional (3D) structure discovery. There are multiple fold discriminatory data sources which use physicochemical and structural properties as well as further data sources derived from local sequence alignments. This raises the issue of finding the most efficient method for combining these different informative data sources and exploring their relative significance for protein fold classification. Kernel methods have been extensively used for biological data analysis. They can incorporate separate fold discriminatory features into kernel matrices which encode the similarity between samples in their respective data sources. 相似文献

16.

蛋白质芯片在肿瘤血清标识物发现中的应用

张莹杨丽娜陶生策《生命的化学》2012,(1):5-11

蛋白质芯片是一种新型的高通量蛋白质组学技术,由于其具有高通量、微型化、可平行快速分析等优点,因此在肿瘤血清标识物发现研究方面具有广泛的应用前景。本文综述了蛋白质芯片的基本原理、类型及其在肿瘤血清标记物发现研究中的应用,将蛋白质芯片技术与传统的肿瘤标志物发现技术进行了比较,并对蛋白质芯片技术在肿瘤标识物发现研究上的进一步应用进行了展望。相似文献

17.

HYPLOSP: a knowledge-based approach to protein local structure prediction

Chen CT Lin HN Sung TY Hsu WL 《Journal of bioinformatics and computational biology》2006,4(6):1287-1307

Local structure prediction can facilitate ab initio structure prediction, protein threading, and remote homology detection. However, the accuracy of existing methods is limited. In this paper, we propose a knowledge-based prediction method that assigns a measure called the local match rate to each position of an amino acid sequence to estimate the confidence of our method. Empirically, the accuracy of the method correlates positively with the local match rate; therefore, we employ it to predict the local structures of positions with a high local match rate. For positions with a low local match rate, we propose a neural network prediction method. To better utilize the knowledge-based and neural network methods, we design a hybrid prediction method, HYPLOSP (HYbrid method to Protein LOcal Structure Prediction) that combines both methods. To evaluate the performance of the proposed methods, we first perform cross-validation experiments by applying our knowledge-based method, a neural network method, and HYPLOSP to a large dataset of 3,925 protein chains. We test our methods extensively on three different structural alphabets and evaluate their performance by two widely used criteria, Maximum Deviation of backbone torsion Angle (MDA) and Q(N), which is similar to Q(3) in secondary structure prediction. We then compare HYPLOSP with three previous studies using a dataset of 56 new protein chains. HYPLOSP shows promising results in terms of MDA and Q(N) accuracy and demonstrates its alphabet-independent capability. 相似文献

18.

An algorithm for protein secondary structure prediction based on class prediction 总被引：19，自引：0，他引：19

G Deléage B Roux 《Protein engineering》1987,1(4):289-294

An algorithm has been developed to improve the success rate in the prediction of the secondary structure of proteins by taking into account the predicted class of the proteins. This method has been called the 'double prediction method' and consists of a first prediction of the secondary structure from a new algorithm which uses parameters of the type described by Chou and Fasman, and the prediction of the class of the proteins from their amino acid composition. These two independent predictions allow one to optimize the parameters calculated over the secondary structure database to provide the final prediction of secondary structure. This method has been tested on 59 proteins in the database (i.e. 10,322 residues) and yields 72% success in class prediction, 61.3% of residues correctly predicted for three states (helix, sheet and coil) and a good agreement between observed and predicted contents in secondary structure. 相似文献

19.

Protein complexes discovery based on protein-protein interaction data via a regularized sparse generative network model

Zhang XF Dai DQ Li XX 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2012,9(3):857-870

Detecting protein complexes from protein interaction networks is one major task in the postgenome era. Previous developed computational algorithms identifying complexes mainly focus on graph partition or dense region finding. Most of these traditional algorithms cannot discover overlapping complexes which really exist in the protein-protein interaction (PPI) networks. Even if some density-based methods have been developed to identify overlapping complexes, they are not able to discover complexes that include peripheral proteins. In this study, motivated by recent successful application of generative network model to describe the generation process of PPI networks and to detect communities from social networks, we develop a regularized sparse generative network model (RSGNM), by adding another process that generates propensities using exponential distribution and incorporating Laplacian regularizer into an existing generative network model, for protein complexes identification. By assuming that the propensities are generated using exponential distribution, the estimators of propensities will be sparse, which not only has good biological interpretation but also helps to control the overlapping rate among detected complexes. And the Laplacian regularizer will lead to the estimators of propensities more smooth on interaction networks. Experimental results on three yeast PPI networks show that RSGNM outperforms six previous competing algorithms in terms of the quality of detected complexes. In addition, RSGNM is able to detect overlapping complexes and complexes including peripheral proteins simultaneously. These results give new insights about the importance of generative network models in protein complexes identification. 相似文献

20.

Genome wide, supercoiling-dependent in vivo binding of a viral protein involved in DNA replication and transcriptional control

González-Huici V Salas M Hermoso JM 《Nucleic acids research》2004,32(8):2306-2314

相似文献