首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 406 毫秒
1.
2.
3.
4.
MOTIVATION: Accurate gene structure annotation is a challenging computational problem in genomics. The best results are achieved with spliced alignment of full-length cDNAs or multiple expressed sequence tags (ESTs) with sufficient overlap to cover the entire gene. For most species, cDNA and EST collections are far from comprehensive. We sought to overcome this bottleneck by exploring the possibility of using combined EST resources from fairly diverged species that still share a common gene space. Previous spliced alignment tools were found inadequate for this task because they rely on very high sequence similarity between the ESTs and the genomic DNA. RESULTS: We have developed a computer program, GeneSeqer, which is capable of aligning thousands of ESTs with a long genomic sequence in a reasonable amount of time. The algorithm is uniquely designed to tolerate a high percentage of mismatches and insertions or deletions in the EST relative to the genomic template. This feature allows use of non-cognate ESTs for gene structure prediction, including ESTs derived from duplicated genes and homologous genes from related species. The increased gene prediction sensitivity results in part from novel splice site prediction models that are also available as a stand-alone splice site prediction tool. We assessed GeneSeqer performance relative to a standard Arabidopsis thaliana gene set and demonstrate its utility for plant genome annotation. In particular, we propose that this method provides a timely tool for the annotation of the rice genome, using abundant ESTs from other cereals and plants. AVAILABILITY: The source code is available for download at http://bioinformatics.iastate.edu/bioinformatics2go/gs/download.html. Web servers for Arabidopsis and other plant species are accessible at http://www.plantgdb.org/cgi-bin/AtGeneSeqer.cgi and http://www.plantgdb.org/cgi-bin/GeneSeqer.cgi, respectively. For non-plant species, use http://bioinformatics.iastate.edu/cgi-bin/gs.cgi. The splice site prediction tool (SplicePredictor) is distributed with the GeneSeqer code. A SplicePredictor web server is available at http://bioinformatics.iastate.edu/cgi-bin/sp.cgi  相似文献   

5.
In the past years, identification of alternative splicing (AS) variants has been gaining momentum. We developed AVATAR, a database for documenting AS using 5,469,433 human EST sequences and 26,159 human mRNA sequences. AVATAR contains 12000 alternative splicing sites identified by mapping ESTs and mRNAs with the whole human genome sequence. AVATAR also contains AS information for 6 eukaryotes. We mapped EST alignment information into a graph model where exons and introns are represented with vertices and edges, respectively. AVATAR can be queried using, (1) gene names, (2) number of identified AS events in a gene, (3) minimal number of ESTs supporting a splicing site, etc. as search parameters. The system provides visualized AS information for queried genes.

Availability  相似文献   


6.
Large numbers of expressed sequence tags (ESTs) have now been generated from a variety of model organisms. In plants, substantial collections of ESTs are available for Arabidopsis and rice, in each case representing significant proportions of the estimated total numbers of genes. Large-scale comparisons of Arabidopsis and rice sequences are especially interesting due to the fact that these two species are representatives of the two subclasses of the flowering plants (Dicotyledonae and Monocotyledonae, respectively). Here we present the results of systematic analysis of the Arabidopsis and rice EST sets. Non-redundant sets of sequences from Arabidopsis and rice were first separately derived and then combined so that gene families in common between the two species could be identified. Our results show that 58% of non-singleton ESTs are derived from genes in gene families common to the two species. These gene families constitute the basis of a core set of higher plant genes.  相似文献   

7.
8.
PIP: a database of potential intron polymorphism markers   总被引:3,自引:0,他引:3  
MOTIVATION: With the recent progress made in large-scale plant functional genome sequencing projects, a great amount of EST (express sequence tag) data is becoming available. With the help of complete genomic sequence information of model plants (rice and Arabidopsis), it is possible to predict the joints between adjacent exons after splicing (or termed 'intron positions' for short) in homologous ESTs of other plants. This would allow developing potential intron polymorphism (PIP) markers in these plants by designing primers in exons flanking the target intron. RESULTS: We have extracted a total of 57,658 PIP markers in 59 plant species and created a web-based database platform named PIP to provide detailed information of these PIP markers and homologous relationships among PIP markers from different species. The platform also provides a function of online designing of PIP markers based on cDNA/EST sequences submitted by users. With evaluations performed in silico, we have found that the intron position prediction is highly reliable and the polymorphism level of PIP markers is high enough for practical need. AVAILABILITY: http://ibi.zju.edu.cn/pgl/pip/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.  相似文献   

9.
Exploring the plant transcriptome through phylogenetic profiling   总被引:5,自引:0,他引:5       下载免费PDF全文
Publicly available protein sequences represent only a small fraction of the full catalog of genes encoded by the genomes of different plants, such as green algae, mosses, gymnosperms, and angiosperms. By contrast, an enormous amount of expressed sequence tags (ESTs) exists for a wide variety of plant species, representing a substantial part of all transcribed plant genes. Integrating protein and EST sequences in comparative and evolutionary analyses is not straightforward because of the heterogeneous nature of both types of sequence data. By combining information from publicly available EST and protein sequences for 32 different plant species, we identified more than 250,000 plant proteins organized in more than 12,000 gene families. Approximately 60% of the proteins are absent from current sequence databases but provide important new information about plant gene families. Analysis of the distribution of gene families over different plant species through phylogenetic profiling reveals interesting insights into plant gene evolution, and identifies species- and lineage-specific gene families, orphan genes, and conserved core genes across the green plant lineage. We counted a similar number of approximately 9,500 gene families in monocotyledonous and eudicotyledonous plants and found strong evidence for the existence of at least 33,700 genes in rice (Oryza sativa). Interestingly, the larger number of genes in rice compared to Arabidopsis (Arabidopsis thaliana) can partially be explained by a larger amount of species-specific single-copy genes and species-specific gene families. In addition, a majority of large gene families, typically containing more than 50 genes, are bigger in rice than Arabidopsis, whereas the opposite seems true for small gene families.  相似文献   

10.
一种新的EST聚类方法   总被引:11,自引:0,他引:11  
该研究发展了一种EST(expressed sequence tag)聚类方法(ESTClustering),用于分析大规模EST测序中所产生的大量数据,以获得高质量,非重复表达序列,该方法在聚类过程中采用MEGABLAST工具对一致序列进行序列同源比较,并用phrap程序对每一EST簇进行拼接检验。这一聚类策略能降低测序错误带来的影响,有效识别基因家族成员,并避免选择性剪接的干扰,与NCB(National Center for Biotechnology Information)的UniGene clustering)方法相比,ESTClustering的聚类结果可以更好地反映表达序列的多样性,用ESTClustering对112256条拟南芥EST聚类测试,产生23581个EST簇,其中13597个EST簇有对应拟南芥基因组编码序列,与该基因组中有EST作为依据的预测基因数目接近。应用该方法对收集的147191条水稻EST序列进行聚类,形成33896个EST簇。  相似文献   

11.
Using a strategy requiring only modest computational resources, wheat expressed sequence tag (EST) sequences from various sources were assembled into contigs and compared with a nonredundant barley sequence assembly, with ESTs, with complete draft genome sequences of rice and Arabidopsis thaliana, and with ESTs from other plant species. These comparisons indicate that (i) wheat sequences available from public sources represent a substantial proportion of the diversity of wheat coding sequences, (ii) prediction of open reading frames in the whole genome sequence improves when supplemented with EST information from other species, (iii) a substantial number of candidates for novel genes that are unique to wheat or related species can be identified, and (iv) a smaller number of genes can be identified that are common to monocots and dicots but absent from Arabidopsis. The sequences in the last group may have been lost from Arabidopsis after descendance from a common ancestor. Examples of potential novel wheat genes and Triticeae-specific genes are presented.  相似文献   

12.
Expressed sequence tags (ESTs) currently encompass more entries in the public databases than any other form of sequence data. Thus, EST data sets provide a vast resource for gene identification and expression profiling. We have mapped the complete set of 176,915 publicly available Arabidopsis EST sequences onto the Arabidopsis genome using GeneSeqer, a spliced alignment program incorporating sequence similarity and splice site scoring. About 96% of the available ESTs could be properly aligned with a genomic locus, with the remaining ESTs deriving from organelle genomes and non-Arabidopsis sources or displaying insufficient sequence quality for alignment. The mapping provides verified sets of EST clusters for evaluation of EST clustering programs. Analysis of the spliced alignments suggests corrections to current gene structure annotation and provides examples of alternative and non-canonical pre-mRNA splicing. All results of this study were parsed into a database and are accessible via a flexible Web interface at http://www.plantgdb.org/AtGDB/.  相似文献   

13.
14.
Impact of genomics approaches on plant genetics and physiology   总被引:2,自引:0,他引:2  
  相似文献   

15.
16.
Hormonal control of grass inflorescence development   总被引:2,自引:0,他引:2  
Grass inflorescences produce the grain that feeds the world. Compared to eudicots such as Arabidopsis (Arabidopsis thaliana), grasses have a complex inflorescence morphology that can be explained by differences in the activity of axillary meristems. Advances in genomics, such as the completion of the rice (Oryza sativa) and sorghum (Sorghum bicolor) genomes and the recent release of a draft sequence of the maize (Zea mays) genome, have greatly facilitated research in grasses. Here, we review recent progress in the understanding of the genetic regulation of grass inflorescence development, with a focus on maize and rice. An exciting theme is the key role of plant growth hormones in inflorescence development.  相似文献   

17.
18.
王磊  陈景堂  张祖新 《遗传》2007,29(9):1055-1060
随着拟南芥、水稻等模式植物基因组测序计划的完成, 比较基因组学作为一门新兴学科, 近年来发展迅速, 为植物基因组的进化、结构和功能研究开辟了新的途径。文章综述了比较基因组学在作物比较遗传作图、基因结构区域的微共线性、ESTs和蛋白质水平的比较以及基于比较基因组学的基因和QTL的克隆等方面内容与研究进展, 分析了不同水平上比较基因组学研究策略的原理、特点、可行性, 以期为利用模式生物的基因和基因组数据、采用比较基因组学策略克隆作物重要性状功能基因、阐明基因组结构与进化提供帮助。  相似文献   

19.
20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号