期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Applying negative rule mining to improve genome annotation

Irena I Artamonova Goar Frishman Dmitrij Frishman 《BMC bioinformatics》2007,8(1):261

Background

Unsupervised annotation of proteins by software pipelines suffers from very high error rates. Spurious functional assignments are usually caused by unwarranted homology-based transfer of information from existing database entries to the new target sequences. We have previously demonstrated that data mining in large sequence annotation databanks can help identify annotation items that are strongly associated with each other, and that exceptions from strong positive association rules often point to potential annotation errors. Here we investigate the applicability of negative association rule mining to revealing erroneously assigned annotation items. 相似文献

2.

Automatic annotation of protein function

Valencia A 《Current opinion in structural biology》2005,15(3):267-274

The annotation of protein function at genomic scale is essential for day-to-day work in biology and for any systematic approach to the modeling of biological systems. Currently, functional annotation is essentially based on the expansion of the relatively small number of experimentally determined functions to large collections of proteins. The task of systematic annotation faces formidable practical problems related to the accuracy of the input experimental information, the reliability of current systems for transferring information between related sequences, and the reproducibility of the links between database information and the original experiments reported in publications. These technical difficulties merely lie on the surface of the deeper problem of the evolution of protein function in the context of protein sequences and structures. Given the mixture of technical and scientific challenges, it is not surprising that errors are introduced, and expanded, in database annotations. In this situation, a more realistic option is the development of a reliability index for database annotations, instead of depending exclusively on efforts to correct databases. Several groups have attempted to compare the database annotations of similar proteins, which constitutes the first steps toward the calibration of the relationship between sequence and annotation space. 相似文献

3.

Sma3s: A Three-Step Modular Annotator for Large Sequence Datasets

Antonio Mu?oz-Mérida Enrique Viguera M. Gonzalo Claros Oswaldo Trelles Antonio J. Pérez-Pulido 《DNA research》2014,21(4):341-353

相似文献

4.

Percolation of annotation errors through hierarchically structured protein sequence databases

Gilks WR Audit B de Angelis D Tsoka S Ouzounis CA 《Mathematical biosciences》2005,193(2):223-234

相似文献

5.

Evaluation of annotation strategies using an entire genome sequence 总被引：2，自引：0，他引：2

Iliopoulos I Tsoka S Andrade MA Enright AJ Carroll M Poullet P Promponas V Liakopoulos T Palaios G Pasquier C Hamodrakas S Tamames J Yagnik AT Tramontano A Devos D Blaschke C Valencia A Brett D Martin D Leroy C Rigoutsos I Sander C Ouzounis CA 《Bioinformatics (Oxford, England)》2003,19(6):717-726

MOTIVATION: Genome-wide functional annotation either by manual or automatic means has raised considerable concerns regarding the accuracy of assignments and the reproducibility of methodologies. In addition, a performance evaluation of automated systems that attempt to tackle sequence analyses rapidly and reproducibly is generally missing. In order to quantify the accuracy and reproducibility of function assignments on a genome-wide scale, we have re-annotated the entire genome sequence of Chlamydia trachomatis (serovar D), in a collaborative manner. RESULTS: We have encoded all annotations in a structured format to allow further comparison and data exchange and have used a scale that records the different levels of potential annotation errors according to their propensity to propagate in the database due to transitive function assignments. We conclude that genome annotation may entail a considerable amount of errors, ranging from simple typographical errors to complex sequence analysis problems. The most surprising result of this comparative study is that automatic systems might perform as well as the teams of experts annotating genome sequences. 相似文献

6.

Comparative omics-driven genome annotation refinement: application across Yersiniae

Schrimpe-Rutledge AC Jones MB Chauhan S Purvine SO Sanford JA Monroe ME Brewer HM Payne SH Ansong C Frank BC Smith RD Peterson SN Motin VL Adkins JN 《PloS one》2012,7(3):e33903

相似文献

7.

Recognition of transmembrane segments in proteins: review and consistency-based benchmarking of internet servers

Sadovskaya NS Sutormin RA Gelfand MS 《Journal of bioinformatics and computational biology》2006,4(5):1033-1056

Membrane proteins perform a number of crucial functions as transporters, receptors, and components of enzyme complexes. Identification of membrane proteins and prediction of their topology is thus an important part of genome annotation. We present here an overview of transmembrane segments in protein sequences, summarize data from large-scale genome studies, and report results of benchmarking of several popular internet servers. 相似文献

8.

Comparisons of Annotation Predictions for Affymetrix GeneChips®

Stalteri M Harrison A 《Applied bioinformatics》2006,5(4):237-248

相似文献

9.

Automatic assessment of alignment quality 总被引：1，自引：0，他引：1

Lassmann T Sonnhammer EL 《Nucleic acids research》2005,33(22):7120-7128

Multiple sequence alignments play a central role in the annotation of novel genomes. Given the biological and computational complexity of this task, the automatic generation of high-quality alignments remains challenging. Since multiple alignments are usually employed at the very start of data analysis pipelines, it is crucial to ensure high alignment quality. We describe a simple, yet elegant, solution to assess the biological accuracy of alignments automatically. Our approach is based on the comparison of several alignments of the same sequences. We introduce two functions to compare alignments: the average overlap score and the multiple overlap score. The former identifies difficult alignment cases by expressing the similarity among several alignments, while the latter estimates the biological correctness of individual alignments. We implemented both functions in the MUMSA program and demonstrate the overall robustness and accuracy of both functions on three large benchmark sets. 相似文献

10.

Plant protein-coding gene families: emerging bioinformatics approaches

Martinez M 《Trends in plant science》2011,16(10):558-567

Protein-coding gene families are sets of similar genes with a shared evolutionary origin and, generally, with similar biological functions. In plants, the size and role of gene families has been only partially addressed. However, suitable bioinformatics tools are being developed to cluster the enormous number of sequences currently available in databases. Specifically, comparative genomic databases promise to become powerful tools for gene family annotation in plant clades. In this review, I evaluate the data retrieved from various gene family databases, the ease with which they can be extracted and how useful the extracted information is. 相似文献

11.

Automatic detection of false annotations via binary property clustering

Noam?Kaplan Email author Michal?Linial 《BMC bioinformatics》2005,6(1):46

Background

Computational protein annotation methods occasionally introduce errors. False-positive (FP) errors are annotations that are mistakenly associated with a protein. Such false annotations introduce errors that may spread into databases through similarity with other proteins. Generally, methods used to minimize the chance for FPs result in decreased sensitivity or low throughput. We present a novel protein-clustering method that enables automatic separation of FP from true hits. The method quantifies the biological similarity between pairs of proteins by examining each protein's annotations, and then proceeds by clustering sets of proteins that received similar annotation into biological groups. 相似文献

12.

Functional annotation and analysis of Korean patented biological sequences using bioinformatics

Lee BW Kim TH Kim SK Kim SS Ryu GC Bhak J 《Molecules and cells》2006,21(2):269-275

A recent report of the Korean Intellectual Property Office (KIPO) showed that the number of biological sequence-based patents is rapidly increasing in Korea. We present biological features of Korean patented sequences though bioinformatic analysis. The analysis is divided into two steps. The first is an annotation step in which the patented sequences were annotated with the Reference Sequence (RefSeq) database. The second is an association step in which the patented sequences were linked to genes, diseases, pathway, and biological functions. We used Entrez Gene, Online Mendelian Inheritance in Man (OMIM), Kyoto Encyclopedia of Genes and Genomes (KEGG), and Gene Ontology (GO) databases. Through the association analysis, we found that nearly 2.6% of human genes were associated with Korean patenting, compared to 20% of human genes in the U.S. patent. The association between the biological functions and the patented sequences indicated that genes whose products act as hormones on defense responses in the extra-cellular environments were the most highly targeted for patenting. The analysis data are available at http://www.patome.net. 相似文献

13.

Knowledge-based voting algorithm for automated protein functional annotation

Yu GX Glass EM Karonis NT Maltsev N 《Proteins》2005,61(4):907-917

Automated annotation of high-throughput genome sequences is one of the earliest steps toward a comprehensive understanding of the dynamic behavior of living organisms. However, the step is often error-prone because of its underlying algorithms, which rely mainly on a simple similarity analysis, and lack of guidance from biological rules. We present herein a knowledge-based protein annotation algorithm. Our objectives are to reduce errors and to improve annotation confidences. This algorithm consists of two major components: a knowledge system, called "RuleMiner," and a voting procedure. The knowledge system, which includes biological rules and functional profiles for each function, provides a platform for seamless integration of multiple sequence analysis tools and guidance for function annotation. The voting procedure, which relies on the knowledge system, is designed to make (possibly) unbiased judgments in functional assignments among complicated, sometimes conflicting, information. We have applied this algorithm to 10 prokaryotic bacterial genomes and observed a significant improvement in annotation confidences. We also discuss the current limitations of the algorithm and the potential for future improvement. 相似文献

14.

Assigning new GO annotations to protein data bank sequences by combining structure and sequence homology

Ponomarenko JV Bourne PE Shindyalov IN 《Proteins》2005,58(4):855-865

Accompanying the discovery of an increasing number of proteins, there is the need to provide functional annotation that is both highly accurate and consistent. The Gene Ontology (GO) provides consistent annotation in a computer readable and usable form; hence, GO annotation (GOA) has been assigned to a large number of protein sequences based on direct experimental evidence and through inference determined by sequence homology. Here we show that this annotation can be extended and corrected for cases where protein structures are available. Specifically, using the Combinatorial Extension (CE) algorithm for structure comparison, we extend the protein annotation currently provided by GOA at the European Bioinformatics Institute (EBI) to further describe the contents of the Protein Data Bank (PDB). Specific cases of biologically interesting annotations derived by this method are given. Given that the relationship between sequence, structure, and function is complicated, we explore the impact of this relationship on assigning GOA. The effect of superfolds (folds with many functions) is considered and, by comparison to the Structural Classification of Proteins (SCOP), the individual effects of family, superfamily, and fold. 相似文献

15.

prot4EST: Translating Expressed Sequence Tags from neglected genomes

James?D?Wasmuth Email author Mark?L?Blaxter 《BMC bioinformatics》2004,5(1):187

相似文献

16.

Intron length distributions and gene prediction 总被引：2，自引：1，他引：1

Roy SW Penny D 《Nucleic acids research》2007,35(14):4737-4742

相似文献

17.

Domestication of transposable elements into MicroRNA genes in plants

Li Y Li C Xia J Jin Y 《PloS one》2011,6(5):e19212

相似文献

18.

GOChase: correcting errors from Gene Ontology-based annotations for gene products

Park YR Park CH Kim JH 《Bioinformatics (Oxford, England)》2005,21(6):829-831

SUMMARY: The Gene Ontology (GO) is a controlled biological vocabulary that provides three structured networks of terms to describe biological processes, cellular components and molecular functions. Many databases of gene products are annotated using the GO vocabularies. We found that some GO-updating operations are not easily traceable by the current biological databases and GO browsers. Consequently, numerous annotation errors arise and are propagated throughout biological databases and GO-based high-level analyses. GOChase is a set of web-based utilities to detect and correct the errors in GO-based annotations. 相似文献

19.

bitacora: A comprehensive tool for the identification and annotation of gene families in genome assemblies

Joel Vizueta Alejandro Snchez‐Gracia Julio Rozas 《Molecular ecology resources》2020,20(5):1445-1452

相似文献

20.

Efficient secondary database driven annotation using model organism sequences

Faria-Campos AC Campos SV Prosdocimi F Franco GC Franco GR Ortega JM 《In silico biology》2006,6(5):363-372

相似文献