首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
EVidenceModeler (EVM) is presented as an automated eukaryotic gene structure annotation tool that reports eukaryotic gene structures as a weighted consensus of all available evidence. EVM, when combined with the Program to Assemble Spliced Alignments (PASA), yields a comprehensive, configurable annotation system that predicts protein-coding genes and alternatively spliced isoforms. Our experiments on both rice and human genome sequences demonstrate that EVM produces automated gene structure annotation approaching the quality of manual curation.  相似文献   

2.
Full-genome analysis of resistance gene homologues in rice   总被引:18,自引:0,他引:18  
The availability of the rice genome sequence enabled the global characterization of nucleotide-binding site (NBS)–leucine-rich repeat (LRR) genes, the largest class of plant disease resistance genes. The rice genome carries approximately 500 NBS–LRR genes that are very similar to the non-Toll/interleukin-1 receptor homology region (TIR) class (class 2) genes of Arabidopsis but none that are homologous to the TIR class genes. Over 100 of these genes were predicted to be pseudogenes in the rice cultivar Nipponbare, but some of these are functional in other rice lines. Over 80 other NBS-encoding genes were identified that belonged to four different classes, only two of which are present in dicotyledonous plant sequences present in databases. Map positions of the identified genes show that these genes occur in clusters, many of which included members from distantly related groups. Members of phylogenetic subgroups of the class 2 NBS–LRR genes mapped to as many as ten different chromosomes. The patterns of duplication of the NBS–LRR genes indicate that they were duplicated by many independent genetic events that have occurred continuously through the expansion of the NBS–LRR superfamily and the evolution of the modern rice genome. Genetic events, such as inversions, that inhibit the ability of recently duplicated genes to recombine promote the divergence of their sequences by inhibiting concerted evolution.Electronic Supplementary Material Supplementary material is available for this article at  相似文献   

3.
The rice (Oryza sativa L.) Xa3/Xa26 gene, conferring race-specific resistance to bacterial blight disease and encoding a leucine-rich repeat (LRR) receptor kinase-like protein, belongs to a multigene family consisting of tandem clustered homologous genes, colocalizing with several uncharacterized genes for resistance to bacterial blight or fungal blast. To provide more information on the expressional and biochemical characteristics of the Xa3/Xa26 family, we analyzed the family members. Four Xa3/Xa26 family members in the indica rice variety Teqing, which carries a bacterial blight resistance gene with a chromosomal location tightly linked to Xa3/Xa26, and five Xa3/Xa26 family members in the japonica rice variety Nipponbare, which carries at least one uncharacterized blast resistance gene, were constitutively expressed in leaf tissue. The result suggests that some of the family members may be candidates of these uncharacterized resistance genes. At least five putative N-glycosylation sites in the LRR domain of XA3/XA26 protein are not glycosylated. The XA3/XA26 and its family members MRKa and MRKc all possess the consensus sequences of paired cysteines, which putatively function in dimerization of the receptor proteins for signal transduction, immediately before the first LRR and immediately after the last LRR. However, no homo-dimer between the XA3/XA26 molecules or hetero-dimer between XA3/XA26 and MRKa or MRKc were formed, indicating that XA3/XA26 protein might function either as a monomer or a hetero-dimer formed with other protein outside of the XA3/XA26 family. These results provide valuable information for further extensive investigation into this multiple protein family.  相似文献   

4.
The innate immune responses mediated by Toll-like receptors (TLR) provide an evolutionarily well-conserved first line of defense against microbial pathogens. In the Reactome Knowledgebase we previously integrated annotations of human TLR molecular functions with those of over 4000 other human proteins involved in processes such as adaptive immunity, DNA replication, signaling, and intermediary metabolism, and have linked these annotations to external resources, including PubMed, UniProt, EntrezGene, Ensembl, and the Gene Ontology to generate a resource suitable for data mining, pathway analysis, and other systems biology approaches. We have now used a combination of manual expert curation and computer-based orthology analysis to generate a set of annotations for TLR molecular function in the chicken (Gallus gallus). Mammalian and avian lineages diverged approximately 300 million years ago, and the avian TLR repertoire consists of both orthologs and distinct new genes. The work described here centers on the molecular biology of TLR3, the host receptor that mediates responses to viral and other doubled-stranded polynucleotides, as a paradigm for our approach to integrated manual and computationally based annotation and data analysis. It tests the quality of computationally generated annotations projected from human onto other species and supports a systems biology approach to analysis of virus-activated signaling pathways and identification of clinically useful antiviral measures.  相似文献   

5.
The majority of disease resistance (R) genes identified to date in plants encode a nucleotide-binding site (NBS) and leucine-rich repeat (LRR) domain containing protein. Additional domains such as coiled-coil (CC) and TOLL/interleukin-1 receptor (TIR) domains can also be present. In the recently sequenced Solanum tuberosum group phureja genome we used HMM models and manual curation to annotate 435 NBS-encoding R gene homologs and 142 NBS-derived genes that lack the NBS domain. Highly similar homologs for most previously documented Solanaceae R genes were identified. A surprising ~41% (179) of the 435 NBS-encoding genes are pseudogenes primarily caused by premature stop codons or frameshift mutations. Alignment of 81.80% of the 577 homologs to S. tuberosum group phureja pseudomolecules revealed non-random distribution of the R-genes; 362 of 470 genes were found in high density clusters on 11 chromosomes.  相似文献   

6.
Despite the structure and objectivity provided by the Gene Ontology (GO), the annotation of proteins is a complex task that is subject to errors and inconsistencies. Electronically inferred annotations in particular are widely considered unreliable. However, given that manual curation of all GO annotations is unfeasible, it is imperative to improve the quality of electronically inferred annotations. In this work, we analyze the full GO molecular function annotation of UniProtKB proteins, and discuss some of the issues that affect their quality, focusing particularly on the lack of annotation consistency. Based on our analysis, we estimate that 64% of the UniProtKB proteins are incompletely annotated, and that inconsistent annotations affect 83% of the protein functions and at least 23% of the proteins. Additionally, we present and evaluate a data mining algorithm, based on the association rule learning methodology, for identifying implicit relationships between molecular function terms. The goal of this algorithm is to assist GO curators in updating GO and correcting and preventing inconsistent annotations. Our algorithm predicted 501 relationships with an estimated precision of 94%, whereas the basic association rule learning methodology predicted 12,352 relationships with a precision below 9%.  相似文献   

7.
Mechanisms of host plant resistance against insect pests can be manifold. Resistance screenings generally use single target insect pests, but the resistance thus screened may not always be specific to the target insect species. We conducted a test for non‐specific resistance in indica rice varieties with resistance genes against brown planthopper (BPH), by using the Indian meal moth, Plodia interpunctella. The test system was very simple, and only required the non‐pest moth to be reared on rice flour. We compared the survival rate, developmental period and adult weight of the moth on three rice varieties: ‘Nipponbare’, a BPH‐susceptible japonica variety, and ‘Thai Collection 11’ and ‘Pokkali’, two resistant indica varieties. Our results were straightforward and demonstrate that resistance in the two resistant rice varieties is not BPH specific, because development of the moth was retarded and adult body weight was reduced.  相似文献   

8.
水稻全基因组编码抗病基因同源序列分析   总被引:1,自引:1,他引:0  
利用模糊搜索的方法,在TIGR水稻日本晴基因组数据库(TIGR Rice Genome Annotation-Release5)中识别出565个编码抗病蛋白质的同源序列;利用识别出565个编码抗病蛋白质序列分别与籼稻基因组数据库进行BLASTP联配,共确定320个对应的等位基因。通过在线生物信息学软件,识别了这565个抗病基因的保守结构域、保守模体和DNA序列内转座子元件,其中有14个抗病基因同源序列注释错误。同时绘出了这些基因的基因组分布,并基于这些基因的同源树分析和基因组物理分布,认为基因的原位和远程复制事件产生了抗病基因的现存分布和多样性,其中转座子在复制过程中扮演了重要角色。这些对抗病机制研究和抗病基因进化研究以及抗病基因的转育具有重要意义。  相似文献   

9.
10.
Next‐generation technologies generate an overwhelming amount of gene sequence data. Efficient annotation tools are required to make these data amenable to functional genomics analyses. The Mercator pipeline automatically assigns functional terms to protein or nucleotide sequences. It uses the MapMan ‘BIN’ ontology, which is tailored for functional annotation of plant ‘omics’ data. The classification procedure performs parallel sequence searches against reference databases, compiles the results and computes the most likely MapMan BINs for each query. In the current version, the pipeline relies on manually curated reference classifications originating from the three reference organisms (Arabidopsis, Chlamydomonas, rice), various other plant species that have a reviewed SwissProt annotation, and more than 2000 protein domain and family profiles at InterPro, CDD and KOG. Functional annotations predicted by Mercator achieve accuracies above 90% when benchmarked against manual annotation. In addition to mapping files for direct use in the visualization software MapMan, Mercator provides graphical overview charts, detailed annotation information in a convenient web browser interface and a MapMan‐to‐GO translation table to export results as GO terms. Mercator is available free of charge via http://mapman.gabipd.org/web/guest/app/Mercator .  相似文献   

11.
Genome annotation conceptually consists of inferring and assigning biological information to gene products. Over the years, numerous pipelines and computational tools have been developed aiming to automate this task and assist researchers in gaining knowledge about target genes of study. However, even with these technological advances, manual annotation or manual curation is necessary, where the information attributed to the gene products is verified and enriched. Despite being called the gold standard process for depositing data in a biological database, the task of manual curation requires significant time and effort from researchers who sometimes have to parse through numerous products in various public databases. To assist with this problem, we present CODON, a tool for manual curation of genomic data, capable of performing the prediction and annotation process. This software makes use of a finite state machine in the prediction process and automatically annotates products based on information obtained from the Uniprot database. CODON is equipped with a simple and intuitive graphic interface that assists on manual curation, enabling the user to decide about the analysis based on information as to identity, length of the alignment, and name of the organism in which the product obtained a match. Further, visual analysis of all matches found in the database is possible, impacting significantly in the curation task considering that the user has at his disposal all the information available for a given product. An analysis performed on eleven organisms was used to test the efficiency of this tool by comparing the results of prediction and annotation through CODON to ones from the NCBI and RAST platforms.  相似文献   

12.
With the availability of a new highly contiguous Bos taurus reference genome assembly (ARS-UCD1.2), it is the opportune time to upgrade the bovine gene set by seeking input from researchers. Furthermore, advances in graphical genome annotation tools now make it possible for researchers to leverage sequence data generated with the latest technologies to collaboratively curate genes. For many years the Bovine Genome Database (BGD) has provided tools such as the Apollo genome annotation editor to support manual bovine gene curation. The goal of this paper is to explain the reasoning behind the decisions made in the manual gene curation process while providing examples using the existing BGD tools. We will describe the sources of gene annotation evidence provided at the BGD, including RNA-seq and Iso-Seq data. We will also explain how to interpret various data visualizations when curating gene models, and will demonstrate the value of manual gene annotation. The process described here can be applied to manual gene curation for other species with similar tools. With a better understanding of manual gene annotation, researchers will be encouraged to edit gene models and contribute to the enhancement of livestock gene sets.  相似文献   

13.
14.
15.
16.
We have developed a rice (Oryza sativa) genome annotation database (Osa1) that provides structural and functional annotation for this emerging model species. Using the sequence of O. sativa subsp. japonica cv Nipponbare from the International Rice Genome Sequencing Project, pseudomolecules, or virtual contigs, of the 12 rice chromosomes were constructed. Our most recent release, version 3, represents our third build of the pseudomolecules and is composed of 98% finished sequence. Genes were identified using a series of computational methods developed for Arabidopsis (Arabidopsis thaliana) that were modified for use with the rice genome. In release 3 of our annotation, we identified 57,915 genes, of which 14,196 are related to transposable elements. Of these 43,719 non-transposable element-related genes, 18,545 (42.4%) were annotated with a putative function, 5,777 (13.2%) were annotated as encoding an expressed protein with no known function, and the remaining 19,397 (44.4%) were annotated as encoding a hypothetical protein. Multiple splice forms (5,873) were detected for 2,538 genes, resulting in a total of 61,250 gene models in the rice genome. We incorporated experimental evidence into 18,252 gene models to improve the quality of the structural annotation. A series of functional data types has been annotated for the rice genome that includes alignment with genetic markers, assignment of gene ontologies, identification of flanking sequence tags, alignment with homologs from related species, and syntenic mapping with other cereal species. All structural and functional annotation data are available through interactive search and display windows as well as through download of flat files. To integrate the data with other genome projects, the annotation data are available through a Distributed Annotation System and a Genome Browser. All data can be obtained through the project Web pages at http://rice.tigr.org.  相似文献   

17.
18.
Gene Ontology (GO) has established itself as the undisputed standard for protein function annotation. Most annotations are inferred electronically, i.e. without individual curator supervision, but they are widely considered unreliable. At the same time, we crucially depend on those automated annotations, as most newly sequenced genomes are non-model organisms. Here, we introduce a methodology to systematically and quantitatively evaluate electronic annotations. By exploiting changes in successive releases of the UniProt Gene Ontology Annotation database, we assessed the quality of electronic annotations in terms of specificity, reliability, and coverage. Overall, we not only found that electronic annotations have significantly improved in recent years, but also that their reliability now rivals that of annotations inferred by curators when they use evidence other than experiments from primary literature. This work provides the means to identify the subset of electronic annotations that can be relied upon-an important outcome given that >98% of all annotations are inferred without direct curation.  相似文献   

19.
Annotating the genome of Medicago truncatula   总被引:3,自引:0,他引:3  
Medicago truncatula will be among the first plant species to benefit from the completion of a whole-genome sequencing project. For each of these species, Arabidopsis, rice and now poplar and Medicago, annotation, the process of identifying gene structures and defining their functions, is essential for the research community to benefit from the sequence data generated. Annotation of the Arabidopsis genome involved gene-by-gene curation of the entire genome, but the larger genomes of rice, Medicago and other species necessitate the automation of the annotation process. Profiting from the experience gained from previous whole-genome efforts, a uniform set of Medicago gene annotations has been generated by coordinated international effort and, along with other views of the genome data, has been provided to the research community at several websites.  相似文献   

20.

Background

Gene-list annotations are critical for researchers to explore the complex relationships between genes and functionalities. Currently, the annotations of a gene list are usually summarized by a table or a barplot. As such, potentially biologically important complexities such as one gene belonging to multiple annotation categories are difficult to extract. We have devised explicit and efficient visualization methods that provide intuitive methods for interrogating the intrinsic connections between biological categories and genes.

Findings

We have constructed a data model and now present two novel methods in a Bioconductor package, "GeneAnswers", to simultaneously visualize genes, concepts (a.k.a. annotation categories), and concept-gene connections (a.k.a. annotations): the "Concept-and-Gene Network" and the "Concept-and-Gene Cross Tabulation". These methods have been tested and validated with microarray-derived gene lists.

Conclusions

These new visualization methods can effectively present annotations using Gene Ontology, Disease Ontology, or any other user-defined gene annotations that have been pre-associated with an organism's genome by human curation, automated pipelines, or a combination of the two. The gene-annotation data model and associated methods are available in the Bioconductor package called "GeneAnswers " described in this publication.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号