首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Using a strategy requiring only modest computational resources, wheat expressed sequence tag (EST) sequences from various sources were assembled into contigs and compared with a nonredundant barley sequence assembly, with ESTs, with complete draft genome sequences of rice and Arabidopsis thaliana, and with ESTs from other plant species. These comparisons indicate that (i) wheat sequences available from public sources represent a substantial proportion of the diversity of wheat coding sequences, (ii) prediction of open reading frames in the whole genome sequence improves when supplemented with EST information from other species, (iii) a substantial number of candidates for novel genes that are unique to wheat or related species can be identified, and (iv) a smaller number of genes can be identified that are common to monocots and dicots but absent from Arabidopsis. The sequences in the last group may have been lost from Arabidopsis after descendance from a common ancestor. Examples of potential novel wheat genes and Triticeae-specific genes are presented.  相似文献   

2.
3.
4.
5.
6.
7.
8.
Sequence similarity was used to predict the position of expressed sequence tags (ESTs) in the genome of the turkey (Meleagris gallopavo). Turkey EST sequences were compared with the draft assembly of the chicken whole-genome sequence and the chicken EST database by BLASTN. Among the 877 ESTs examined, 788 had significant matches in the chicken genome sequence. Position of orthologous sequences in the chicken genome and the predicted position of the EST loci in the turkey genome are presented. Genetic assignments suggest a high level of accuracy for the COMPASS predictions.  相似文献   

9.
Sequence similarity was used to predict the position of expressed sequence tags (ESTs) in the genome of the turkey (Meleagris gallopavo). Turkey EST sequences were compared with the draft assembly of the chicken whole-genome sequence and the chicken EST database by BLASTN. Among the 877 ESTs examined, 788 had significant matches in the chicken genome sequence. Position of orthologous sequences in the chicken genome and the predicted position of the EST loci in the turkey genome are presented Genetic assignments suggest a high level of accuracy for the COMPASS predictions.  相似文献   

10.
11.
Along with the rapid advances of the nextgen sequencing technologies, more and more species are added to the list of organisms whose whole genomes are sequenced. However, the assembled draft genome of many organisms consists of numerous small contigs, due to the short length of the reads generated by nextgen sequencing platforms. In order to improve the assembly and bring the genome contigs together, more genome resources are needed. In this study, we developed a strategy to generate a valuable genome resource, physical map contig-specific sequences, which are randomly distributed genome sequences in each physical contig. Two-dimensional tagging method was used to create specific tags for 1,824 physical contigs, in which the cost was dramatically reduced. A total of 94,111,841 100-bp reads and 315,277 assembled contigs are identified containing physical map contig-specific tags. The physical map contig-specific sequences along with the currently available BAC end sequences were then used to anchor the catfish draft genome contigs. A total of 156,457 genome contigs (~79% of whole genome sequencing assembly) were anchored and grouped into 1,824 pools, in which 16,680 unique genes were annotated. The physical map contig-specific sequences are valuable resources to link physical map, genetic linkage map and draft whole genome sequences, consequently have the capability to improve the whole genome sequences assembly and scaffolding, and improve the genome-wide comparative analysis as well. The strategy developed in this study could also be adopted in other species whose whole genome assembly is still facing a challenge.  相似文献   

12.
13.
一种新的EST聚类方法   总被引:11,自引:0,他引:11  
该研究发展了一种EST(expressed sequence tag)聚类方法(ESTClustering),用于分析大规模EST测序中所产生的大量数据,以获得高质量,非重复表达序列,该方法在聚类过程中采用MEGABLAST工具对一致序列进行序列同源比较,并用phrap程序对每一EST簇进行拼接检验。这一聚类策略能降低测序错误带来的影响,有效识别基因家族成员,并避免选择性剪接的干扰,与NCB(National Center for Biotechnology Information)的UniGene clustering)方法相比,ESTClustering的聚类结果可以更好地反映表达序列的多样性,用ESTClustering对112256条拟南芥EST聚类测试,产生23581个EST簇,其中13597个EST簇有对应拟南芥基因组编码序列,与该基因组中有EST作为依据的预测基因数目接近。应用该方法对收集的147191条水稻EST序列进行聚类,形成33896个EST簇。  相似文献   

14.
L D Chaves  J A Rowe  K M Reed 《Génome》2005,48(1):12-17
Genome characterization and analysis is an imperative step in identifying and selectively breeding for improved traits of agriculturally important species. Expressed sequence tags (ESTs) represent a transcribed portion of the genome and are an effective way to identify genes within a species. Downstream applications of EST projects include DNA microarray construction and interspecies comparisons. In this study, 694 ESTs were sequenced and analyzed from a library derived from a 24-day-old turkey embryo. The 437 unique sequences identified were divided into 76 assembled contigs and 361 singletons. The majority of significant comparative matches occurred between the turkey sequences and sequences reported from the chicken. Whole genome sequence from the chicken was used to identify potential exon-intron boundaries for selected turkey clones and intron-amplifying primers were developed for sequence analysis and single nucleotide polymorphism (SNP) discovery. Identified SNPs were genotyped for linkage analysis on two turkey reference populations. This study significantly increases the number of EST sequences available for the turkey.  相似文献   

15.
16.
17.
Among the cereals, wheat is the most widely grown geographically and is part of the staple diet in much of the world. Understanding how the cereal endosperm develops and functions will help generate better tools to manipulate grain qualities important to end-users. We used a genomics approach to identify and characterize genes that are expressed in the wheat endosperm. We analyzed the 17,949 publicly available wheat endosperm EST sequences to identify genes involved in the biological processes that occur within this tissue. Clustering and assembly of the ESTs resulted in the identification of 6,187 tentative unique genes, 2,358 of which formed contigs and 3,829 remained as singletons. A BLAST similarity search against the NCBI non-redundant sequence database revealed abundant messages for storage proteins, putative defense proteins, and proteins involved in starch and sucrose metabolism. The level of abundance of the putatively identified genes reflects the physiology of the developing endosperm. Half of the identified genes have unknown functions. Approximately 61% of the endosperm ESTs has been tentatively mapped in the hexaploid wheat genome. Using microarrays for global RNA profiling, we identified endosperm genes that are specifically up regulated in the developing grain.  相似文献   

18.
19.
20.
Illumina's Genome Analyzer generates ultra-short sequence reads, typically 36 nucleotides in length, and is primarily intended for resequencing. We tested the potential of this technology for de novo sequence assembly on the 6 Mbp genome of Pseudomonas syringae pv. syringae B728a with several freely available assembly software packages. Using an unpaired data set, velvet assembled >96% of the genome into contigs with an N50 length of 8289 nucleotides and an error rate of 0.33%. edena generated smaller contigs (N50 was 4192 nucleotides) and comparable error rates. ssake and vcake yielded shorter contigs with very high error rates. Assembly of paired-end sequence data carrying 400 bp inserts produced longer contigs (N50 up to 15 628 nucleotides), but with increased error rates (0.5%). Contig length and error rate were very sensitive to the choice of parameter values. Noncoding RNA genes were poorly resolved in de novo assemblies, while >90% of the protein-coding genes were assembled with 100% accuracy over their full length. This study demonstrates that, in practice, de novo assembly of 36-nucleotide reads can generate reasonably accurate assemblies from about 40 × deep sequence data sets. These draft assemblies are useful for exploring an organism's proteomic potential, at a very economic low cost.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号