期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Automated genome sequence analysis and annotation. 总被引：5，自引：0，他引：5

M A Andrade N P Brown C Leroy S Hoersch A de Daruvar C Reich A Franchini J Tamames A Valencia C Ouzounis C Sander 《Bioinformatics (Oxford, England)》1999,15(5):391-412

MOTIVATION: Large-scale genome projects generate a rapidly increasing number of sequences, most of them biochemically uncharacterized. Research in bioinformatics contributes to the development of methods for the computational characterization of these sequences. However, the installation and application of these methods require experience and are time consuming. RESULTS: We present here an automatic system for preliminary functional annotation of protein sequences that has been applied to the analysis of sets of sequences from complete genomes, both to refine overall performance and to make new discoveries comparable to those made by human experts. The GeneQuiz system includes a Web-based browser that allows examination of the evidence leading to an automatic annotation and offers additional information, views of the results, and links to biological databases that complement the automatic analysis. System structure and operating principles concerning the use of multiple sequence databases, underlying sequence analysis tools, lexical analyses of database annotations and decision criteria for functional assignments are detailed. The system makes automatic quality assessments of results based on prior experience with the underlying sequence analysis tools; overall error rates in functional assignment are estimated at 2.5-5% for cases annotated with highest reliability ('clear' cases). Sources of over-interpretation of results are discussed with proposals for improvement. A conservative definition for reporting 'new findings' that takes account of database maturity is presented along with examples of possible kinds of discoveries (new function, family and superfamily) made by the system. System performance in relation to sequence database coverage, database dynamics and database search methods is analysed, demonstrating the inherent advantages of an integrated automatic approach using multiple databases and search methods applied in an objective and repeatable manner. AVAILABILITY: The GeneQuiz system is publicly available for analysis of protein sequences through a Web server at http://www.sander.ebi.ac. uk/gqsrv/submit 相似文献

2.

AGeS: a software system for microbial genome sequence annotation

Kumar K Desai V Cheng L Khitrov M Grover D Satya RV Yu C Zavaljevski N Reifman J 《PloS one》2011,6(3):e17469

BACKGROUND: The annotation of genomes from next-generation sequencing platforms needs to be rapid, high-throughput, and fully integrated and automated. Although a few Web-based annotation services have recently become available, they may not be the best solution for researchers that need to annotate a large number of genomes, possibly including proprietary data, and store them locally for further analysis. To address this need, we developed a standalone software application, the Annotation of microbial Genome Sequences (AGeS) system, which incorporates publicly available and in-house-developed bioinformatics tools and databases, many of which are parallelized for high-throughput performance. METHODOLOGY: The AGeS system supports three main capabilities. The first is the storage of input contig sequences and the resulting annotation data in a central, customized database. The second is the annotation of microbial genomes using an integrated software pipeline, which first analyzes contigs from high-throughput sequencing by locating genomic regions that code for proteins, RNA, and other genomic elements through the Do-It-Yourself Annotation (DIYA) framework. The identified protein-coding regions are then functionally annotated using the in-house-developed Pipeline for Protein Annotation (PIPA). The third capability is the visualization of annotated sequences using GBrowse. To date, we have implemented these capabilities for bacterial genomes. AGeS was evaluated by comparing its genome annotations with those provided by three other methods. Our results indicate that the software tools integrated into AGeS provide annotations that are in general agreement with those provided by the compared methods. This is demonstrated by a >94% overlap in the number of identified genes, a significant number of identical annotated features, and a >90% agreement in enzyme function predictions. 相似文献

3.

Predicting Shine-Dalgarno sequence locations exposes genome annotation errors

Starmer J Stomp A Vouk M Bitzer D 《PLoS computational biology》2006,2(5):e57

相似文献

4.

Functional annotation from the genome sequence of the giant panda

Tong Huo Yinjie Zhang Jianping Lin 《蛋白质与细胞》2012,3(8):602

The giant panda is one of the most critically endangered species due to the fragmentation and loss of its habitat. Studying the functions of proteins in this animal, especially specific trait-related proteins, is therefore necessary to protect the species. In this work, the functions of these proteins were investigated using the genome sequence of the giant panda. Data on 21,001 proteins and their functions were stored in the Giant Panda Protein Database, in which the proteins were divided into two groups: 20,179 proteins whose functions can be predicted by GeneScan formed the known-function group, whereas 822 proteins whose functions cannot be predicted by GeneScan comprised the unknownfunction group. For the known-function group, we further classified the proteins by molecular function, biological process, cellular component, and tissue specificity. For the unknown-function group, we developed a strategy in which the proteins were filtered by cross- Blast to identify panda-specific proteins under the assumption that proteins related to the panda-specific traits in the unknown-function group exist. After this filtering procedure, we identified 32 proteins (2 of which are membrane proteins) specific to the giant panda genome as compared against the dog and horse genomes. Based on their amino acid sequences, these 32 proteins were further analyzed by functional classification using SVM-Prot, motif prediction using MyHits, and interacting protein prediction using the Database of Interacting Proteins. Nineteen proteins were predicted to be zinc-binding proteins, thus affecting the activities of nucleic acids. The 32 panda-specific proteins will be further investigated by structural and functional analysis. 相似文献

5.

Reproducibility in genome sequence annotation: the Plasmodium falciparum chromosome 2 case.

S Tsoka V Promponas C A Ouzounis 《FEBS letters》1999,451(3):354-355

相似文献

6.

Appropriateness: the next frontier.

R. H. Brook 《BMJ (Clinical research ed.)》1994,308(6923):218-219

相似文献

7.

Evaluation of annotation strategies using an entire genome sequence 总被引：2，自引：0，他引：2

Iliopoulos I Tsoka S Andrade MA Enright AJ Carroll M Poullet P Promponas V Liakopoulos T Palaios G Pasquier C Hamodrakas S Tamames J Yagnik AT Tramontano A Devos D Blaschke C Valencia A Brett D Martin D Leroy C Rigoutsos I Sander C Ouzounis CA 《Bioinformatics (Oxford, England)》2003,19(6):717-726

MOTIVATION: Genome-wide functional annotation either by manual or automatic means has raised considerable concerns regarding the accuracy of assignments and the reproducibility of methodologies. In addition, a performance evaluation of automated systems that attempt to tackle sequence analyses rapidly and reproducibly is generally missing. In order to quantify the accuracy and reproducibility of function assignments on a genome-wide scale, we have re-annotated the entire genome sequence of Chlamydia trachomatis (serovar D), in a collaborative manner. RESULTS: We have encoded all annotations in a structured format to allow further comparison and data exchange and have used a scale that records the different levels of potential annotation errors according to their propensity to propagate in the database due to transitive function assignments. We conclude that genome annotation may entail a considerable amount of errors, ranging from simple typographical errors to complex sequence analysis problems. The most surprising result of this comparative study is that automatic systems might perform as well as the teams of experts annotating genome sequences. 相似文献

8.

Deep Imaging: the next frontier in microscopy

Vassilis Roukos Tom Misteli 《Histochemistry and cell biology》2014,142(2):125-131

The microscope is the quintessential tool for discovery in cell biology. From its earliest incarnation as a tool to make the unseen visible, microscopes have been at the center of most revolutionizing developments in cell biology, histology and pathology. Major quantum leaps in imaging involved the dramatic improvements in resolution to see increasingly smaller structures, methods to visualize specific molecules inside of cells and tissues, and the ability to peer into living cells to study dynamics of molecules and cellular structures. The latest revolution in microscopy is Deep Imaging—the ability to look at very large numbers of samples by high-throughput microscopy at high spatial and temporal resolution. This approach is rooted in the development of fully automated high-resolution microscopes and the application of advanced computational image analysis and mining methods. Deep Imaging is enabling two novel, powerful approaches in cell biology: the ability to image thousands of samples with high optical precision allows every discernible morphological pattern to be used as a read-out in large-scale imaging-based screens, particularly in conjunction with RNAi-based screening technology; in addition, the capacity to capture large numbers of images, combined with advanced computational image analysis methods, has also opened the door to detect and analyze very rare cellular events. These two applications of Deep Imaging are revolutionizing cell biology. 相似文献

9.

RiceGAAS: an automated annotation system and database for rice genome sequence 总被引：27，自引：0，他引：27

下载免费PDF全文

Katsumi Sakata Yoshiaki Nagamura Hisataka Numa Baltazar A. Antonio Hideki Nagasaki Atsuko Idonuma Wakako Watanabe Yuji Shimizu Ikuo Horiuchi Takashi Matsumoto Takuji Sasaki Kenichi Higo 《Nucleic acids research》2002,30(1):98-102

An extensive effort of the International Rice Genome Sequencing Project (IRGSP) has resulted in rapid accumulation of genome sequence, and >137 Mb has already been made available to the public domain as of August 2001. This requires a high-throughput annotation scheme to extract biologically useful and timely information from the sequence data on a regular basis. A new automated annotation system and database called Rice Genome Automated Annotation System (RiceGAAS) has been developed to execute a reliable and up-to-date analysis of the genome sequence as well as to store and retrieve the results of annotation. The system has the following functional features: (i) collection of rice genome sequences from GenBank; (ii) execution of gene prediction and homology search programs; (iii) integration of results from various analyses and automatic interpretation of coding regions; (iv) re-execution of analysis, integration and automatic interpretation with the latest entries in reference databases; (v) integrated visualization of the stored data using web-based graphical view. RiceGAAS also has a data submission mechanism that allows public users to perform fully automated annotation of their own sequences. The system can be accessed at http://RiceGAAS.dna.affrc.go.jp/. 相似文献

10.

Large-scale prokaryotic gene prediction and comparison to genome annotation 总被引：4，自引：0，他引：4

Nielsen P Krogh A 《Bioinformatics (Oxford, England)》2005,21(24):4322-4329

MOTIVATION: Prokaryotic genomes are sequenced and annotated at an increasing rate. The methods of annotation vary between sequencing groups. It makes genome comparison difficult and may lead to propagation of errors when questionable assignments are adapted from one genome to another. Genome comparison either on a large or small scale would be facilitated by using a single standard for annotation, which incorporates a transparency of why an open reading frame (ORF) is considered to be a gene. RESULTS: A total of 143 prokaryotic genomes were scored with an updated version of the prokaryotic genefinder EasyGene. Comparison of the GenBank and RefSeq annotations with the EasyGene predictions reveals that in some genomes up to approximately 60% of the genes may have been annotated with a wrong start codon, especially in the GC-rich genomes. The fractional difference between annotated and predicted confirms that too many short genes are annotated in numerous organisms. Furthermore, genes might be missing in the annotation of some of the genomes. We predict 41 of 143 genomes to be over-annotated by >5%, meaning that too many ORFs are annotated as genes. We also predict that 12 of 143 genomes are under-annotated. These results are based on the difference between the number of annotated genes not found by EasyGene and the number of predicted genes that are not annotated in GenBank. We argue that the average performance of our standardized and fully automated method is slightly better than the annotation. 相似文献

11.

Microbial metabolites: the next frontier in human milk

《Trends in microbiology》2022,30(5):408-410

相似文献

12.

Refined annotation of the Arabidopsis genome by complete expressed sequence tag mapping

Zhu W Schlueter SD Brendel V 《Plant physiology》2003,132(2):469-484

Expressed sequence tags (ESTs) currently encompass more entries in the public databases than any other form of sequence data. Thus, EST data sets provide a vast resource for gene identification and expression profiling. We have mapped the complete set of 176,915 publicly available Arabidopsis EST sequences onto the Arabidopsis genome using GeneSeqer, a spliced alignment program incorporating sequence similarity and splice site scoring. About 96% of the available ESTs could be properly aligned with a genomic locus, with the remaining ESTs deriving from organelle genomes and non-Arabidopsis sources or displaying insufficient sequence quality for alignment. The mapping provides verified sets of EST clusters for evaluation of EST clustering programs. Analysis of the spliced alignments suggests corrections to current gene structure annotation and provides examples of alternative and non-canonical pre-mRNA splicing. All results of this study were parsed into a database and are accessible via a flexible Web interface at http://www.plantgdb.org/AtGDB/. 相似文献

13.

Errors in genome annotation 总被引：14，自引：0，他引：14

Brenner SE 《Trends in genetics : TIG》1999,15(4):132-133

相似文献

14.

Complete genome sequence and updated annotation of Desulfovibrio alaskensis G20

Hauser LJ Land ML Brown SD Larimer F Keller KL Rapp-Giles BJ Price MN Lin M Bruce DC Detter JC Tapia R Han CS Goodwin LA Cheng JF Pitluck S Copeland A Lucas S Nolan M Lapidus AL Palumbo AV Wall JD 《Journal of bacteriology》2011,193(16):4268-4269

Desulfovibrio alaskensis G20 (formerly Desulfovibrio desulfuricans G20) is a Gram-negative mesophilic sulfate-reducing bacterium (SRB), known to corrode ferrous metals and to reduce toxic radionuclides and metals such as uranium and chromium to sparingly soluble and less toxic forms. We present the 3.7-Mb genome sequence to provide insights into its physiology. 相似文献

15.

Evolutionary annotation of the genome

Easteal S 《Molecular biology and evolution》2000,17(12):1775

相似文献

16.

Regulation of new biomedical technologies: the next frontier

《Current opinion in biotechnology》2001,12(3):297-298

相似文献

17.

Intrinsic errors in genome annotation 总被引：11，自引：0，他引：11

Devos D Valencia A 《Trends in genetics : TIG》2001,17(8):429-431

Genome sequencing is usually followed by routine annotation of protein function based on the assumption that similar sequences will have similar functions. Here, we introduce a simple calculation to estimate the magnitude of any possible annotation errors. We counted the number of discrepancies in the annotation of well-established sets of similar proteins and extrapolated these values to the pairs of similar sequences used for the annotation of different microbial genomes. We conclude that the number of potential errors in the prediction of detailed functions is higher than is usually believed. 相似文献

18.

Into the unknown: expression profiling without genome sequence information in CHO by next generation sequencing

Fabian Birzele Jochen Schaub Werner Rust Christoph Clemens Patrick Baum Hitto Kaufmann Andreas Weith Torsten W. Schulz Tobias Hildebrandt 《Nucleic acids research》2010,38(12):3999-4010

相似文献

19.

Exploring the next frontier of mouse vision

Niell CM 《Neuron》2011,72(6):889-892

Two studies in this issue of Neuron apply in vivo functional imaging techniques to map out and record from mouse extrastriate visual cortex. They find that distinct areas show hallmarks of processing for different types of visual input and provide a promising path forward to investigate how complex image analysis is performed in the mouse visual system. 相似文献

20.

C-22: The next frontier: Vascularized tissues

《Cryobiology》2015,70(3):508

相似文献