首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
Automated genome sequence analysis and annotation.   总被引:5,自引:0,他引:5  
MOTIVATION: Large-scale genome projects generate a rapidly increasing number of sequences, most of them biochemically uncharacterized. Research in bioinformatics contributes to the development of methods for the computational characterization of these sequences. However, the installation and application of these methods require experience and are time consuming. RESULTS: We present here an automatic system for preliminary functional annotation of protein sequences that has been applied to the analysis of sets of sequences from complete genomes, both to refine overall performance and to make new discoveries comparable to those made by human experts. The GeneQuiz system includes a Web-based browser that allows examination of the evidence leading to an automatic annotation and offers additional information, views of the results, and links to biological databases that complement the automatic analysis. System structure and operating principles concerning the use of multiple sequence databases, underlying sequence analysis tools, lexical analyses of database annotations and decision criteria for functional assignments are detailed. The system makes automatic quality assessments of results based on prior experience with the underlying sequence analysis tools; overall error rates in functional assignment are estimated at 2.5-5% for cases annotated with highest reliability ('clear' cases). Sources of over-interpretation of results are discussed with proposals for improvement. A conservative definition for reporting 'new findings' that takes account of database maturity is presented along with examples of possible kinds of discoveries (new function, family and superfamily) made by the system. System performance in relation to sequence database coverage, database dynamics and database search methods is analysed, demonstrating the inherent advantages of an integrated automatic approach using multiple databases and search methods applied in an objective and repeatable manner. AVAILABILITY: The GeneQuiz system is publicly available for analysis of protein sequences through a Web server at http://www.sander.ebi.ac. uk/gqsrv/submit  相似文献   

3.
4.
5.
In view of the recent explosion in genome sequence data, and the 200 or more complete genome sequences currently available, the importance of genome-scale bioinformatics analysis is increasing rapidly. However, computational genome informatics analyses often lack a statistical assessment of their sensitivity to the completeness of the functional annotation. Therefore, a pre-analysis method to automatically validate the sensitivity of computational genome analyses with regard to genome annotation completeness is useful for this purpose. In this report we developed the Gene Prediction Accuracy Classification (GPAC) test, which provides statistical evidence of sensitivity by repeating the same analysis for five different gene groups (classified according to annotation accuracy level), and for randomly sampled gene groups, with the same number of genes as each of the five classified groups. Variability in these results is then assessed, and if the results vary significantly with different data subsets, the analysis is considered "sensitive" to annotation completeness, and careful selection of data is advised prior to the actual in silico analysis. The GPAC test has been applied to the analyses of Sakai et al., 2001, and Ohno et al., 2001, and it revealed that the analysis of Ohno et al. was more sensitive to annotation completeness. It showed that GPAC could be employed to ascertain the sensitivity of an analysis. The GPAC bendhmarking software is freely available in the latest G-language Genome Analysis Environment package, at http://www.g-language.org/.  相似文献   

6.
Zhang Y  Yin Y  Chen Y  Gao G  Yu P  Luo J  Jiang Y 《BMC genomics》2003,4(1):42

Background  

Many model proteomes or "complete" sets of proteins of given organisms are now publicly available. Much effort has been invested in computational annotation of those "draft" proteomes. Motif or domain based algorithms play a pivotal role in functional classification of proteins. Employing most available computational algorithms, mainly motif or domain recognition algorithms, we set up to develop an online proteome annotation system with integrated proteome annotation data to complement existing resources.  相似文献   

7.
Mining long noncoding RNA in livestock   总被引:2,自引:0,他引:2       下载免费PDF全文
  相似文献   

8.

Background  

The number of sequences compiled in many genome projects is growing exponentially, but most of them have not been characterized experimentally. An automatic annotation scheme must be in an urgent need to reduce the gap between the amount of new sequences produced and reliable functional annotation. This work proposes rules for automatically classifying the fungus genes. The approach involves elucidating the enzyme classifying rule that is hidden in UniProt protein knowledgebase and then applying it for classification. The association algorithm, Apriori, is utilized to mine the relationship between the enzyme class and significant InterPro entries. The candidate rules are evaluated for their classificatory capacity.  相似文献   

9.
The well-established inaccuracy of purely computational methods for annotating genome sequences necessitates an interactive tool to allow biological experts to refine these approximations by viewing and independently evaluating the data supporting each annotation. Apollo was developed to meet this need, enabling curators to inspect genome annotations closely and edit them. FlyBase biologists successfully used Apollo to annotate the Drosophila melanogaster genome and it is increasingly being used as a starting point for the development of customized annotation editing tools for other genome projects.  相似文献   

10.
The JCVI metagenomics analysis pipeline provides for the efficient and consistent annotation of shotgun metagenomics sequencing data for sampling communities of prokaryotic organisms. The process can be equally applied to individual sequence reads from traditional Sanger capillary electrophoresis sequences, newer technologies such as 454 pyrosequencing, or sequence assemblies derived from one or more of these data types. It includes the analysis of both coding and non-coding genes, whether full-length or, as is often the case for shotgun metagenomics, fragmentary. The system is designed to provide the best-supported conservative functional annotation based on a combination of trusted homology-based scientific evidence and computational assertions and an annotation value hierarchy established through extensive manual curation. The functional annotation attributes assigned by this system include gene name, gene symbol, GO terms, EC numbers, and JCVI functional role categories.  相似文献   

11.
12.
13.
The current available data on protein sequences largely exceeds the experimental capabilities to annotate their function. So annotation in silico, i.e. using computational methods becomes increasingly important. This annotation is inevitably a prediction, but it can be an important starting point for further experimental studies. Here we present a method for prediction of protein functional sites, SDPsite, based on the identification of protein specificity determinants. Taking as an input a protein sequence alignment and a phylogenetic tree, the algorithm predicts conserved positions and specificity determinants, maps them onto the protein's 3D structure, and searches for clusters of the predicted positions. Comparison of the obtained predictions with experimental data and data on performance of several other methods for prediction of functional sites reveals that SDPsite agrees well with the experiment and outperforms most of the previously available methods. SDPsite is publicly available under http://bioinf.fbb.msu.ru/SDPsite.  相似文献   

14.
The past decade has seen the completion of numerous whole-genome sequencing projects, began with bacterial genomes and continued with eukaryotic species from different phyla: fungi, plants and animals. Besides, more biological information are produced and are shared thanks to information exchange systems, and more biological concepts, as well as more bioinformatics tools, are available. In this article, we will describe how the evolutionary biology concepts, as well as computer science, are useful for a better understanding of biology in general and genome annotation in particular. The genome annotation process consists of taking the raw DNA produced, for example, by the genome sequencing projects, adding the layers of analysis and interpretation necessary to extract its biological significance and placing it in the context of our understanding of biological processes. Genome annotation is a multistep process falling into two broad categories: structural and functional annotation.  相似文献   

15.
After sequencing the human and mouse genomes, the annotation of these sequences with biological functions is an important challenge in genomic research. A major tool to analyse gene function on the organismal level is the analysis of mutant phenotypes. Because of its genetic and physiological similarity to man, the mouse has become the model organism of choice for the study of genetic diseases. In addition, there is at the moment no other vertebrate for which versatile techniques to manipulate the genome are as well developed. Several mouse mutagenesis projects have provided the proof-of-principle that a systematic and comprehensive mutagenesis of every gene in the mammalian genome will be feasible. An exhaustive functional annotation of the mammalian genome can only be achieved in a combination of phenotype- and gene-driven approaches in large- and small-scale academic and private projects. Major challenges will be to develop standardised phenotyping protocols for the clinical and pathological characterisation of mouse mutants, the improvement of mutation detection methods and the dissemination of resources and data. Beyond gene annotation, it will be necessary to understand how gene functions are integrated into the complex network of regulatory interactions in the cell.  相似文献   

16.
17.
MOTIVATION: Reliable identification of protein families is key to phylogenetic analysis, functional annotation and the exploration of protein function diversity in a given phylogenetic branch. As more and more complete genomes are sequenced, there is a need for powerful and reliable algorithms facilitating protein families construction. RESULTS: We have formulated the problem of protein families construction as an instance of consensus clustering, for which we designed a novel algorithm that is computationally efficient in practice and produces high quality results. Our algorithm uses an election method to construct consensus families from competing clustering computations. Our consensus clustering algorithm is tailored to serve the specific needs of comparative genomics projects. First, it provides a robust means to incorporate results from different and complementary clustering methods, thus avoiding the need for an a priori choice that may introduce computational bias in the results. Second, it is suited to large-scale projects due to the practical efficiency. And third, it produces high quality results where families tend to represent groupings by biological function. AVAILABILITY: This method has been used for Génolevures project to compute protein families of Hemiascomycetous yeasts. The data are available online at http://cbi.labri.fr/Genolevures/fam/  相似文献   

18.
Sequencing of microbial genomes is important because of microbial-carrying antibiotic and pathogenetic activities. However, even with the help of new assembling software, finishing a whole genome is a time-consuming task. In most bacteria, pathogenetic or antibiotic genes are carried in genomic islands. Therefore, a quick genomic island (GI) prediction method is useful for ongoing sequencing genomes. In this work, we built a Web server called GI-POP (http://gipop.life.nthu.edu.tw) which integrates a sequence assembling tool, a functional annotation pipeline, and a high-performance GI predicting module, in a support vector machine (SVM)-based method called genomic island genomic profile scanning (GI-GPS). The draft genomes of the ongoing genome projects in contigs or scaffolds can be submitted to our Web server, and it provides the functional annotation and highly probable GI-predicting results. GI-POP is a comprehensive annotation Web server designed for ongoing genome project analysis. Researchers can perform annotation and obtain pre-analytic information include possible GIs, coding/non-coding sequences and functional analysis from their draft genomes. This pre-analytic system can provide useful information for finishing a genome sequencing project.  相似文献   

19.
20.
The flood of sequence data resulting from the large number of current genome projects has increased the need for a flexible, open source genome annotation system, which so far has not existed. To account for the individual needs of different projects, such a system should be modular and easily extensible. We present a genome annotation system for prokaryote genomes, which is well tested and readily adaptable to different tasks. The modular system was developed using an object-oriented approach, and it relies on a relational database backend. Using a well defined application programmers interface (API), the system can be linked easily to other systems. GenDB supports manual as well as automatic annotation strategies. The software currently is in use in more than a dozen microbial genome annotation projects. In addition to its use as a production genome annotation system, it can be employed as a flexible framework for the large-scale evaluation of different annotation strategies. The system is open source.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号