首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Expressed sequence tag (EST) libraries from members of the Penaeidae family and brine shrimp (Artemia franciscana) are currently the primary source of sequence data for shrimp species. Penaeid shrimp are the most commonly farmed worldwide, but selection methods for improving shrimp are limited. A better understanding of shrimp genomics is needed for farmers to use genetic markers to select the best breeding animals. The ESTs from Litopenaeus vannamei have been previously mined for single nucleotide polymorphisms (SNPs). This present study took publicly available ESTs from nine shrimp species, excluding L. vannamei, clustered them with CAP3, predicted SNPs within them using SNPidentifier, and then analyzed whether the SNPs were intra- or interspecies. Major goals of the project were to predict SNPs that may distinguish shrimp species, locate SNPs that may segregate in multiple species, and determine the genetic similarities between L. vannamei and the other shrimp species based on their EST sequences. Overall, 4,597 SNPs were predicted from 4,600 contigs with 703 of them being interspecies SNPs, 735 of them possibly predicting species' differences, and 18 of them appearing to segregate in multiple species. While sequences appear relatively well conserved, SNPs do not appear to be well conserved across shrimp species.  相似文献   

2.
3.
We made use of 81,635 expressed sequence tags (ESTs) derived from 12 different cDNA libraries of the silkworm, Bombyx mori, inbred strain Dazao (P50), to identify high-quality candidate single nucleotide polymorphisms (SNPs). By PHRAP assembling, 12,980 contigs containing 11,537 contigs assembled by more than one read were obtained, and 101 candidate SNPs and 27 single base insertions/deletions were identified from 117 contigs assembled from 1576 high-quality reads base-called with PHRED and screened on the basis of the neighborhood quality standard (NQS). Simultaneously, we also predicted 40 SNPs in coding regions (cSNPs), of which 26 were predicted to lead to amino acid non-synonymous variations and 14 synonymous substitutions. Also, the 1.66:1 ratio of transition/transversion is different from that of other insects. As the first SNP analysis of a Lepidoptera, B. mori, the single nucleotide polymorphic density is estimated to be 1.3 x 10(-3) by sequence diversity. This analysis shows that expressed sequences from multiple libraries may provide an abundant source of comparative reads to mine for cSNPs from the silkworm genome.  相似文献   

4.
5.
6.
L D Chaves  J A Rowe  K M Reed 《Génome》2005,48(1):12-17
Genome characterization and analysis is an imperative step in identifying and selectively breeding for improved traits of agriculturally important species. Expressed sequence tags (ESTs) represent a transcribed portion of the genome and are an effective way to identify genes within a species. Downstream applications of EST projects include DNA microarray construction and interspecies comparisons. In this study, 694 ESTs were sequenced and analyzed from a library derived from a 24-day-old turkey embryo. The 437 unique sequences identified were divided into 76 assembled contigs and 361 singletons. The majority of significant comparative matches occurred between the turkey sequences and sequences reported from the chicken. Whole genome sequence from the chicken was used to identify potential exon-intron boundaries for selected turkey clones and intron-amplifying primers were developed for sequence analysis and single nucleotide polymorphism (SNP) discovery. Identified SNPs were genotyped for linkage analysis on two turkey reference populations. This study significantly increases the number of EST sequences available for the turkey.  相似文献   

7.
Ginger (Zingiber officinale Rosc.) is an important herb of the family Zingiberaceae. It is accepted as a universal cure for a multitude of diseases in Indian systems of medicine and its rhizomes are equally popular as a spice ingredient throughout Asia. SNPs, the definitive genetic markers, representing the finest resolution of a DNA sequence, are abundantly found in populations having a lower rate of mutation and are used for genomic analysis. The public ESTs sequences mostly lack quality files, making high quality SNPs detection more difficult since it is exclusively based on sequence comparisons. In the present study, current dbESTs of NCBI was mined and 38115 ginger ESTs sequences were obtained and assembled into contigs using CAP3 program. In this analysis, recent software tool QualitySNP was used to detect 11523 potential SNPs sites, 8810 high quality SNPs and 1008 indels polymorphisms with a frequency of 1.61 SNPs / 10 kbp. Of ESTs libraries generated from three ginger tissues together, rhizomes had a frequency of 0.32 SNPs and 0.03 indels per 10 kbp whereas the leaves had a frequency of 2.51 SNPs and 0.23 indels per 10 kbp and root is showing relative frequency of 0.76/10 kbp SNPs and 0.02/10 kbp indels. The present analysis provides additional information about the tissue wise presence of haplotypes (222), distribution of high quality exonic (2355) and intronic (6455) SNPs and information about singletons (7538) in addition to contigs transitions and transversions ratio (0.57). Among all tissue detected SNPs, transversions number is higher in comparison to the number of transitions. Quality SNPs detected in this work can be used as markers for further ginger genetic experiments.  相似文献   

8.
Using the Phred/Phrap/Polyphred/Consed pipeline established in the National Livestock Research Institute of Korea, we predicted candidate coding single nucleotide polymorphisms (cSNPs) from 7,600 expressed sequence tags (ESTs) derived from three cDNA libraries (liver, M. longissimus dorsi, and intermuscular fat) of Hanwoo (Korean native cattle) steers. From the 7,600 ESTs, 829 contigs comprising more than two EST reads were assembled using the Phrap assembler. Based on the contig analysis, 201 candidate cSNPs were identified in 129 contigs, in which transitions (69%) outnumbered transversions (31%). To verify whether the predicted cSNPs are real, 17 SNPs involved in lipid and energy metabolism were selected from the ESTs. Twelve of these were confirmed to be real while five were identified as artifacts, possibly due to expressed sequence tag sequence error. Further analysis of the 12 verified cSNPs was performed using the program BLASTX. Five were identified as nonsynonymous cSNPs, five were synonymous cSNPs, and two SNPs were located in 3'-UTRs. Our data indicated that a relatively high SNP prediction rate (71%) from a large EST database could produce abundant cSNPs rapidly, which can be used as valuable genetic markers in cattle.  相似文献   

9.
By using assembled expressed sequence tags (ESTs) from 14 different eDNA libraries that contain 84 132 sequences reads, 556 Populus candidate single nucleotide polymorphisms (SNPs) were identified. Because traces were not available from dbEST (http://www.ncbi.nlm.nih.gov/dbEST/index.html), stringent filters were used to identify reliable candidate SNPs. Sequences analysis indicated that the main types of substitutions among candidate SNPs were A/G and T/C transitions, which accounted for 22.0% and 30.8%, respectively. One hundred and ten candidate SNPs were tested. As a result, 38 candidate SNPs were confirmed by directed sequencing of PCR products amplified from six different individuals. Thirteen new SNPs in intron regions were found and multiple SNPs were found to be located in both intron and exon regions of four contigs. Heterozygosis was found in all 47 candidate sites and five SNP sites were heterozygous in all six samples. This is the first report of SNP identification in a tree species which reveals that assembled ESTs from multiple libraries of the public database may provide a rich source of comparative sequences for an SNP search in the poplar genome.  相似文献   

10.
Kim JY  Park HS  Lim D  Jang HC  Park HS  Lee KT  Kim JS  Oh SI  Kweon MS  Kim TH  Choi BH 《BMB reports》2011,44(4):238-243
We generated 16,993 expressed sequence tags (ESTs) from two libraries containing full-length cDNAs from the brain and liver of the Korean Jindo dog. An additional 365,909 ESTs from other dog breeds were identified from the NCBI dbEST database, and all ESTs were clustered into 28,514 consensus sequences using StackPack. We selected the 7,305 consensus sequences that could be assembled from at least five ESTs and estimated that 12,533 high-quality single nucleotide polymorphisms (SNPs) were present in 97,835 putative SNPs from the 7,305 consensus sequences. We identified 58 Jindo dog-specific SNPs in comparison to other breeds and predicted seven synonymous SNPs and ten non-synonymous SNPs. Using PolyPhen, a program that predicts changes in protein structure and potential effects on protein function caused by amino acid substitutions, three of the non-synonymous SNPs were predicted to result in changes in protein function for proteins expressed by three different genes (TUSC3, ITIH2, and NAT2).  相似文献   

11.
We identified ~13 000 putative single nucleotide polymorphisms (SNPs) by comparison of repeat‐masked BAC‐end sequences from the cattle RPCI‐42 BAC library with whole‐genome shotgun contigs of cattle genome assembly Btau 1.0. Genotyping of a subset of these SNPs was performed on a panel containing 186 DNA samples from 18 cattle breeds including 43 trios. Of 1039 SNPs confirmed as polymorphic in the panel, 998 had minor allele frequency ≥0.25 among unrelated individuals of at least one breed. When Btau 4.0 became available, 974 of these validated SNPs were assigned in silico to known cattle chromosomes, while 41 SNPs were mapped to unassigned sequence scaffolds, yielding one SNP every ~3 Mbp on average. Twenty‐four SNPs identified in Btau 1.0 were not mapped to Btau 4.0. Of the 1015 SNPs mapped to Btau 4.0, 959 SNPs had nucleotide bases identical in Btau 4.0 and Btau 1.0 contigs, whereas 56 bases were changed, resulting in the loss of the in silico SNP in Btau 4.0. Because these 1039 SNPs were all directly confirmed by genotyping on the multi‐breed panel, it is likely that the original polymorphisms were correctly identified. The 1039 validated SNPs identified in this study represent a new and useful resource for genome‐wide association studies and applications in animal breeding.  相似文献   

12.
Mining single-nucleotide polymorphisms from hexaploid wheat ESTs.   总被引:20,自引:0,他引:20  
Single-nucleotide polymorphisms (SNPs) represent a new form of functional marker, particularly when they are derived from expressed sequence tags (ESTs). A bioinformatics strategy was developed to discover SNPs within a large wheat EST database and to demonstrate the utility of SNPs in genetic mapping and genetic diversity applications. A collection of > 90000 wheat ESTs was assembled into contiguous sequences (contigs), and 45 random contigs were then visually inspected to identify primer pairs capable of amplifying specific alleles. We estimate that homoeologue sequence variants occurred 1 in 24 bp and the frequency of SNPs between wheat genotypes was 1 SNP/540 bp (theta = 0.0069). Furthermore, we estimate that one diagnostic SNP test can be developed from every contig with 10-60 EST members. Thus, EST databases are an abundant source of SNP markers. Polymorphism information content for SNPs ranged from 0.04 to 0.50 and ESTs could be mapped into a framework of microsatellite markers using segregating populations. The results showed that SNPs in wheat can be discovered in ESTs, validated, and be applied to conventional genetic studies.  相似文献   

13.
14.
Sweet orange [Citrus sinensis (L.) Osbeck] represents the most important Citrus species, followed by clementine (C. clementina Hort. ex Tan.). Citrus species and genotypes are difficult to recognize as they have a moderate level of diversity due to nucellar selection, vegetative propagation and origin by single spontaneous mutation. Despite the large number of available sequences and the existence of a draft assembly of sweet orange and clementine, there are currently no single nucleotide polymorphism (SNP) databases for Citrus species. For this purpose, the QualitySNP software was used to discover SNPs in 19 Citrus species starting from 540,000 expressed sequence tags (ESTs) assembled in 52,000 contigs. The vast majority of ESTs, contigs and SNPs were found in C. clementina and C. sinensis: 4,400 out of 16,000 contigs (27 %) of C. clementina and 4,100 out of 17,000 contigs (24 %) of C. sinensis contained putative SNPs. A total of 3,634 sequences were associated with enzymes belonging to 121 metabolic KEGG (Kyoto Encyclopedia of Genes and Genomes) pathways, among which the secondary metabolite pathway was the most represented. A total of 163 SNPs from 52 contigs and genes of specific functional categories were validated and 81 polymorphic sites were found. Thirty-seven selected SNPs, validated by Sanger sequencing, confirmed that polymorphisms were mainly between species, while poor within-species variability was discovered. This work provides a collection of 15,879 putative SNP markers that could be exploited by the Citrus community. Furthermore, the validated SNPs associated with specific genes could be used for functional genetic studies in germplasm diversity analysis, mapping and breeding.  相似文献   

15.
Single nucleotide polymorphisms (SNPs) are useful for characterizing allelic variation, for genome-wide mapping, and as a tool for marker-assisted selection. Discovery of SNPs through de novo sequencing is inefficient within cultivated tomato (Lycopersicon esculentum Mill.) because the polymorphism rate is more than ten-fold lower than the sequencing error rate. The availability of expressed sequence tag (EST) data has made it feasible to discover putative SNPs in silico prior to experimental verification. By exploiting redundancy among EST data available for different varieties among 148,373 tomato ESTs, we have identified candidate SNPs for use within cultivated germplasm pools. 1,245 contigs having three EST sequences of Rio Grande and three EST sequences of TA496 were used for SNP discovery. We detected 1 SNP for every 8,500 bases analyzed, with 101 candidate SNPs in 44 genes identified. Sixty-six SNPs could be recognized by restriction enzymes, and subsequent experimental verification using restriction digestion or CEL I digestion confirmed 83% of the putative polymorphisms tested. SNPs between TA496 and Rio Grande have a high probability (53%) of detecting polymorphisms between other L. esculentum varieties. Twenty-six SNPs in 18 unigenes were mapped to specific chromosomes. Two SNPs, LEOH23 and LEOH37, were shown to be linked to quantitative trait loci contributing to fruit color within elite breeding populations. These results suggest that the growing databases of DNA sequence will yield information that facilitates improvement within the germplasm pools that have contributed to productive modern varieties.  相似文献   

16.
Molecular markers are used to provide the link between genotype and phenotype, for the production of molecular genetic maps and to assess genetic diversity within and between related species. Single nucleotide polymorphisms (SNPs) are the most abundant molecular genetic marker. SNPs can be identified in silico , but care must be taken to ensure that the identified SNPs reflect true genetic variation and are not a result of errors associated with DNA sequencing. The SNP detection method autoSNP has been developed to identify SNPs from sequence data for any species. Confidence in the predicted SNPs is based on sequence redundancy, and haplotype co-segregation scores are calculated for a further independent measure of confidence. We have extended the autoSNP method to produce autoSNPdb, which integrates SNP and gene annotation information with a graphical viewer. We have applied this software to public barley expressed sequences, and the resulting database is available over the Internet. SNPs can be viewed and searched by sequence, functional annotation or predicted synteny with a reference genome, in this case rice. The correlation between SNPs and barley cultivar, expressed tissue type and development stage has been collated for ease of exploration. An average of one SNP per 240 bp was identified, with SNPs more prevalent in the 5' regions and simple sequence repeat (SSR) flanking sequences. Overall, autoSNPdb can provide a wealth of genetic polymorphism information for any species for which sequence data are available.  相似文献   

17.
A compilation of soybean ESTs: generation and analysis.   总被引:18,自引:0,他引:18  
Whole-genome sequencing is fundamental to understanding the genetic composition of an organism. Given the size and complexity of the soybean genome, an alternative approach is targeted random-gene sequencing, which provides an immediate and productive method of gene discovery. In this study, more than 120000 soybean expressed sequence tags (ESTs) generated from more than 50 cDNA libraries were evaluated. These ESTs coalesced into 16928 contigs and 17336 singletons. On average, each contig was composed of 6 ESTs and spanned 788 bases. The average sequence length submitted to dbEST was 414 bases. Using only those libraries generating more than 800 ESTs each and only those contigs with 10 or more ESTs each, correlated patterns of gene expression among libraries and genes were discerned. Two-dimensional qualitative representations of contig and library similarities were generated based on expression profiles. Genes with similar expression patterns and, potentially, similar functions were identified. These studies provide a rich source of publicly available gene sequences as well as valuable insight into the structure, function, and evolution of a model crop legume genome.  相似文献   

18.
19.
The availability of genomic resources can facilitate progress in plant breeding through the application of advanced molecular technologies for crop improvement. This is particularly important in the case of less researched crops such as cassava, a staple and food security crop for more than 800 million people. Here, expressed sequence tags (ESTs) were generated from five drought stressed and well-watered cassava varieties. Two cDNA libraries were developed: one from root tissue (CASR), the other from leaf, stem and stem meristem tissue (CASL). Sequencing generated 706 contigs and 3,430 singletons. These sequences were combined with those from two other EST sequencing initiatives and filtered based on the sequence quality. Quality sequences were aligned using CAP3 and embedded in a Windows browser called HarvEST:Cassava which is made available. HarvEST:Cassava consists of a Unigene set of 22,903 quality sequences. A total of 2,954 putative SNPs were identified. Of these 1,536 SNPs from 1,170 contigs and 53 cassava genotypes were selected for SNP validation using Illumina’s GoldenGate assay. As a result 1,190 SNPs were validated technically and biologically. The location of validated SNPs on scaffolds of the cassava genome sequence (v.4.1) is provided. A diversity assessment of 53 cassava varieties reveals some sub-structure based on the geographical origin, greater diversity in the Americas as opposed to Africa, and similar levels of diversity in West Africa and southern, eastern and central Africa. The resources presented allow for improved genetic dissection of economically important traits and the application of modern genomics-based approaches to cassava breeding and conservation.  相似文献   

20.
Single nucleotide polymorphisms (SNP) are the most abundant type of DNA polymorphism found in animal and plant genomes. They provide an important new source of molecular markers that are useful in genetic mapping, map-based positional cloning, quantitative trait locus mapping and the assessment of genetic distances between individuals. Very little is known on the frequency of SNPs in cassava. We have exploited the recently-developed collection of cassava expressed sequence tags (ESTs) to detect SNPs in the five cultivars of cassava used to generate the sequences. The frequency of intra-cultivar and inter-cultivar SNPs after analysis of 111 contigs was one polymorphism per 905 and one per 1,032 bp, respectively; totaling 1 each 509 bp. We have obtained further information on the frequency of SNPs in six cassava cultivars by analysis of 33 amplicons obtained from 3 EST and BAC end sequences. Overall, about 11 kb of DNA sequence was obtained for each cultivar. A total of 186 SNPs (136 and 50 from ESTs and BAC ends, respectively) were identified. Among these, 146 were intra-cultivar polymorphisms, while 80 were inter-cultivar polymorphisms. Thus the total frequency of SNPs was one per 62 bp. This information will help to develop new strategies for EST mapping as well as their association with phenotypic characteristics.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号