首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 78 毫秒
1.
2.
Mining single-nucleotide polymorphisms from hexaploid wheat ESTs.   总被引:20,自引:0,他引:20  
Single-nucleotide polymorphisms (SNPs) represent a new form of functional marker, particularly when they are derived from expressed sequence tags (ESTs). A bioinformatics strategy was developed to discover SNPs within a large wheat EST database and to demonstrate the utility of SNPs in genetic mapping and genetic diversity applications. A collection of > 90000 wheat ESTs was assembled into contiguous sequences (contigs), and 45 random contigs were then visually inspected to identify primer pairs capable of amplifying specific alleles. We estimate that homoeologue sequence variants occurred 1 in 24 bp and the frequency of SNPs between wheat genotypes was 1 SNP/540 bp (theta = 0.0069). Furthermore, we estimate that one diagnostic SNP test can be developed from every contig with 10-60 EST members. Thus, EST databases are an abundant source of SNP markers. Polymorphism information content for SNPs ranged from 0.04 to 0.50 and ESTs could be mapped into a framework of microsatellite markers using segregating populations. The results showed that SNPs in wheat can be discovered in ESTs, validated, and be applied to conventional genetic studies.  相似文献   

3.
4.
The public EST (expressed sequence tag) databases represent an enormous but heterogeneous repository of sequences, including many from a broad selection of plant species and a wide range of distinct varieties. The significant redundancy within large EST collections makes them an attractive resource for rapid pre-selection of candidate sequence polymorphisms. Here we present a strategy that allows rapid identification of candidate SNPs in barley (Hordeum vulgare L.) using publicly available EST databases. Analysis of 271,630 EST sequences from different cDNA libraries, representing 23 different barley varieties, resulted in the generation of 56,302 tentative consensus sequences. In all, 8171 of these unigene sequences are members of clusters with six or more ESTs. By applying a novel SNP detection algorithm (SNiPpER) to these sequences, we identified 3069 candidate inter-varietal SNPs. In order to verify these candidate SNPs, we selected a small subset of 63 present in 36 ESTs. Of the 63 SNPs selected, we were able to validate 54 (86%) using a direct sequencing approach. For further verification, 28 ESTs were mapped to distinct loci within the barley genome. The polymorphism information content (PIC) and nucleotide diversity () values of the SNPs identified by the SNiPpER algorithm are significantly higher than those that were obtained by random sequencing. This demonstrates the efficiency of our strategy for SNP identification and the cost-efficient development of EST-based SNP-markers.The first two authors contributed equally to this work  相似文献   

5.
Single nucleotide polymorphisms in cytochrome P450 genes from barley   总被引:12,自引:0,他引:12  
Plant cytochrome P450s are known to be essential in a number of economically important pathways of plant metabolism but there are also many P450s of unknown function accumulating in expressed sequence tag (EST) and genomic databases. To detect trait associations that could assist in the assignment of gene function and provide markers for breeders selecting for commercially important traits, detection of polymorphisms in identified P450 genes is desirable. Polymorphisms in EST sequences provide so-called perfect markers for the associated genes. The International Triticeae EST Cooperative data base of 24,344 ESTs was searched for sequences exhibiting homology to P450 genes representing the nine known clans of plant P450s. Seventy five P450 ESTs were identified of which 24 had best matches in Genbank to P450 genes of known function and 51 to P450s of unknown function. Sequence information from PCR products amplified from the genomic template DNA of 11 barley varieties was obtained using primers designed from six barley P450 ESTs and one durum wheat P450 EST. Single nucleotide polymorphisms (SNPs) between barley varieties were identified using five of the seven PCR products. A maximum of five SNPs and three haplotypes among the 11 barley lines were detected in products from any one primer pair. SNPs in three PCR products led to changes between barley varieties in at least one restriction site enabling genotyping and mapping without the expense of a specialist SNP detection system. The overall frequency of SNPs across the 11 barley varieties was 1 every 131 bases.  相似文献   

6.
Xin D  Sun J  Wang J  Jiang H  Hu G  Liu C  Chen Q 《Molecular biology reports》2012,39(9):9047-9057
Microsatellites, or simple sequence repeats (SSRs), are very useful molecular markers for a number of plant species. We used a new publicly available module (TROLL) to extract microsatellites from the public database of soybean expressed sequence tag (EST) sequences. A total of 12,833 sequences containing di- to penta-type SSRs were identified from 200,516 non-redundant soybean ESTs. On average, one SSR was found per 7.25?kb of EST sequences, with the tri-nucleotide motifs being the most abundant. Primer sequences flanking the SSR motifs were successfully designed for 9,638 soybean ESTs using the software primer3.0 and only 59 pairs of them were found in earlier studies. We synthesized 124 pairs of the primers to determine the polymorphism and heterozygosity among eight genotypes of soybean cultivars, which represented a wide range of the cultivated soybean cultivars. PCR amplification products with anticipated SSRs were obtained with 81 pairs of primers; 36 PCR products appeared to be homozygous and the remaining 45 PCR products appeared to be heterozygous and displayed polymorphism among the eight cultivars. We further analysed the EST sequences containing 45 polymorphic EST-SSR markers using the programs BLASTN and BLASTX. Sequence alignment showed that 29 ESTs have homologous sequences and 15 ESTs could be classified into a Uni-gene cluster with comparatively convincing protein products. Among these 15 ESTs belonging to a Uni-gene cluster, 9 SSRs were located in 3'-UTR, 4 SSRs were located in the intron region and 2 SSRs were located in the CDS region. None of these SSRs was located in the 5'-UTR. These novel SSRs identified in the ESTs of soybean provide useful information for gene mapping and cloning in future studies.  相似文献   

7.
Characterisation of single nucleotide polymorphisms in sugarcane ESTs   总被引:1,自引:0,他引:1  
Commercial sugarcane cultivars (Saccharum spp. hybrids) are both polyploid and aneuploid with chromosome numbers in excess of 100; these chromosomes can be assigned to 8 homology groups. To determine the utility of single nucleotide polymorphisms (SNPs) as a means of improving our understanding of the complex sugarcane genome, we developed markers to a suite of SNPs identified in a list of sugarcane ESTs. Analysis of 69 EST contigs showed a median of 9 SNPs per EST and an average of 1 SNP per 50 bp of coding sequence. The quantitative presence of each base at 58 SNP loci within 19 contiguous sequence sets was accurately and reliably determined for 9 sugarcane genotypes, including both commercial cultivars and ancestral species, through the use of quantitative light emission technology in pyrophosphate sequencing. Across the 9 genotypes tested, 47 SNP loci were polymorphic and 11 monomorphic. Base frequency at individual SNP loci was found to vary approximately twofold between Australian sugarcane cultivars and more widely between cultivars and wild species. Base quantity was shown to segregate as expected in the IJ76-514 × Q165 sugarcane mapping population, indicating that SNPs that occur on one or two sugarcane chromosomes have the potential to be mapped. The use of SNP base frequencies from five of the developed markers was able to clearly distinguish all genotypes in the population. The use of SNP base frequencies from a further six markers within an EST contig was able to help establish the likely copy number of the locus in two genotypes tested. This is the first instance of a technology that has been able to provide an insight into the copy number of a specific gene locus in hybrid sugarcane. The identification of specific and numerous haplotypes/alleles present in a genotype by pyrophosphate sequencing or alternative techniques ultimately will provide the basis for identifying associations between specific alleles and phenotype and between allele dosage and phenotype in sugarcane.Electronic Supplementary Material Supplementary material is available for this article at and is accessible for authorized users.  相似文献   

8.
Single nucleotide polymorphisms (SNPs) are useful for characterizing allelic variation, for genome-wide mapping, and as a tool for marker-assisted selection. Discovery of SNPs through de novo sequencing is inefficient within cultivated tomato (Lycopersicon esculentum Mill.) because the polymorphism rate is more than ten-fold lower than the sequencing error rate. The availability of expressed sequence tag (EST) data has made it feasible to discover putative SNPs in silico prior to experimental verification. By exploiting redundancy among EST data available for different varieties among 148,373 tomato ESTs, we have identified candidate SNPs for use within cultivated germplasm pools. 1,245 contigs having three EST sequences of Rio Grande and three EST sequences of TA496 were used for SNP discovery. We detected 1 SNP for every 8,500 bases analyzed, with 101 candidate SNPs in 44 genes identified. Sixty-six SNPs could be recognized by restriction enzymes, and subsequent experimental verification using restriction digestion or CEL I digestion confirmed 83% of the putative polymorphisms tested. SNPs between TA496 and Rio Grande have a high probability (53%) of detecting polymorphisms between other L. esculentum varieties. Twenty-six SNPs in 18 unigenes were mapped to specific chromosomes. Two SNPs, LEOH23 and LEOH37, were shown to be linked to quantitative trait loci contributing to fruit color within elite breeding populations. These results suggest that the growing databases of DNA sequence will yield information that facilitates improvement within the germplasm pools that have contributed to productive modern varieties.  相似文献   

9.
A search was performed for single-nucleotide polymorphisms (SNP) and short insertions-deletions (indels) in 34 melon (Cucumis melo L.) expressed sequence tag (EST) fragments between two distantly related melon genotypes, a group Inodorus 'Piel de sapo' market class breeding line T111 and the Korean accession PI 161375. In total, we studied 15 kb of melon sequence. The average frequency of SNPs between the two genotypes was one every 441 bp. One indel was also found every 1666 bp. Seventy-five percent of the polymorphisms were located in introns and the 3'untranslated regions. On average, there were 1.26 SNPs plus indels per amplicon. We explored three different SNP detection systems to position five of the SNPs in a melon genetic map. Three of the SNPs were mapped using cleaved amplified polymorphic sequence (CAPS) markers, one SNP was mapped using the single primer extension reaction with fluorescent-labelled dideoxynucleotides, and one indel was mapped using polyacrilamide gel electrophoresis separation. The discovery of SNPs based on ESTs and a suitable system for SNP detection has broad potential utility in melon genome mapping.  相似文献   

10.
Temperature gradient capillary electrophoresis (TGCE) can be used to distinguish heteroduplex from homoduplex DNA molecules and can thus be applied to the detection of various types of DNA polymorphisms. Unlike most single nucleotide polymorphism (SNP) detection technologies, TGCE can be used even in the absence of prior knowledge of the sequences of the underlying polymorphisms. TGCE is both sensitive and reliable in detecting SNPs, small InDel (insertion/deletion) polymorphisms (IDPs) and simple sequence repeats, and using this technique it is possible to detect a single SNP in amplicons of over 800 bp and 1-bp IDPs in amplicons of approximately 500 bp. Genotyping data obtained via TGCE are consistent with data obtained via gel-based detection technologies. For genetic mapping experiments, TGCE has a number of advantages over alternative heteroduplex-detection technologies such as celery endonuclease (CELI) and denaturing high-performance liquid chromatography (dHPLC). Multiplexing can increase TGCEs throughput to 12 markers on 94 recombinant inbreds per day. Given its ability to efficiently and reliably detect a variety of subtle DNA polymorphisms that occur at high frequency in genes, TGCE shows great promise for discovering polymorphisms and conducting genetic mapping and genotyping experiments.Electronic Supplementary Material Supplementary material is available for this article at  相似文献   

11.
Expressed sequence tags (ESTs) have proven to be a valuable tool to discover single nucleotide polymorphism (SNP) in human genes but their use for this purpose is still limited in higher plants. Using a database of approximately 250,000 sugarcane ESTs we have recovered 219 sequences encoding alcohol dehydrogenases ( Adh), which tagged 178 distinct cDNAs from 27 libraries, constructed from at least four different cultivars. The partitioning of these ESTs into paralogous genes revealed three Adh genes expressed in sugarcane, one Adh2 and two Adh1. The soundness of the partition was carefully checked by comparison to external data, especially from the closely related sorghum. Analysis of polymorphism in the alignments of EST sequences revealed a total of 37 highly reliable SNPs in the coding and untranslated regions of the three Adh genes. In the coding regions, the mean occurrence of SNPs was one for every 122 base pair. A total of eight insertion-deletions was observed, their occurrence being limited to untranslated regions. These results show that EST data constitute an invaluable source of sequence polymorphism for sugarcane that is worth carefully collecting for the future development of new marker tools.  相似文献   

12.
In this study, we describe the first set of SNP markers for the South African abalone, Haliotis midae. A cDNA library was constructed from which ESTs were selected for the screening of SNPs. The observed frequency of SNPs in this species was estimated at one every 185 bp. When characterized in wild-caught abalone, the minor allele frequencies and F(ST) estimates for every SNP indicated that these markers may potentially be useful for population analysis, parentage assignment and linkage mapping in Haliotis midae. No linkage disequilibrium was observed between SNPs originating from different EST sequences. These SNPs, together with additional SNPs currently being developed, will provide a useful complementary set of markers to the currently available genetic markers in abalone.  相似文献   

13.

Background

Availability of molecular markers has proven to be an efficient tool in facilitating progress in plant breeding, which is particularly important in the case of less researched crops such as cotton. Considering the obvious advantages of single nucleotide polymorphisms (SNPs) and insertion-deletion polymorphisms (InDels), expressed sequence tags (ESTs) were analyzed in silico to identify SNPs and InDels in this study, aiming to develop more molecular markers in cotton.

Results

A total of 1,349 EST-based SNP and InDel markers were developed by comparing ESTs between Gossypium hirsutum and G. barbadense, mining G. hirsutum unigenes, and analyzing 3′ untranslated region (3′UTR) sequences. The marker polymorphisms were investigated using the two parents of the mapping population based on the single-strand conformation polymorphism (SSCP) analysis. Of all the markers, 137 (10.16%) were polymorphic, and revealed 142 loci. Linkage analysis using a BC1 population mapped 133 loci on the 26 chromosomes. Statistical analysis of base variations in SNPs showed that base transitions accounted for 55.78% of the total base variations and gene ontology indicated that cotton genes varied greatly in harboring SNPs ranging from 1.00 to 24.00 SNPs per gene. Sanger sequencing of three randomly selected SNP markers revealed discrepancy between the in silico predicted sequences and the actual sequencing results.

Conclusions

In silico analysis is a double-edged blade to develop EST-SNP/InDel markers. On the one hand, the designed markers can be well used in tetraploid cotton genetic mapping. And it plays a certain role in revealing transition preference and SNP frequency of cotton genes. On the other hand, the developmental efficiency of markers and polymorphism of designed primers are comparatively low.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-1046) contains supplementary material, which is available to authorized users.  相似文献   

14.
Discovery of single nucleotide polymorphisms (SNPs) requires analysis of redundant sequences such as those available in large public databases. The ability to detect SNPs, especially those of low frequency, is dependent on the depth and scale of the discovery effort. Large numbers of SNPs have been identified by mining large-scale EST surveys and whole genome sequencing projects. These surveys however are subject to ascertainment bias and the inherent errors in large-scale single pass sequencing efforts. For example, the number of steps involved in the construction and sequencing of cDNA libraries make ESTs highly error prone, resulting in an increased frequency of nonvalid SNPs obtained in these surveys. Sequences of mtDNA genes are often incorporated into cDNA libraries as an artifact of the library construction process and are typically either subtracted from cDNA libraries or are considered superfluous when evaluating the information content of EST datasets. Sequences of mtDNA genes provide a unique resource for the analysis of SNP parameters in EST projects. This study uses sequences from four turkey muscle cDNA libraries to demonstrate how mtDNA sequences gleaned from collections of ESTs can be used to estimate SNP parameters and thus help predict the validity of SNPs.  相似文献   

15.
Discovery of single nucleotide polymorphisms (SNPs) requires analysis of redundant sequences such as those available in large public databases. The ability to detect SNPs, especially those of low frequency, is dependent on the depth and scale of the discovery effort. Large numbers of SNPs have been identified by mining large-scale EST surveys and whole genome sequencing projects. These surveys however are subject to ascertainment bias and the inherent errors in large-scale single pass sequencing efforts. For example, the number of steps involved in the construction and sequencing of cDNA libraries make ESTs highly error prone, resulting in an increased frequency of nonvalid SNPs obtained in these surveys. Sequences of mtDNA genes are often incorporated into cDNA libraries as an artifact of the library construction process and are typically either subtracted from cDNA libraries or are considered superfluous when evaluating the information content of EST datasets. Sequences of mtDNA genes provide a unique resource for the analysis of SNP parameters in EST projects. This study uses sequences from four turkey muscle cDNA libraries to demonstrate how mtDNA sequences gleaned from collections of ESTs can be used to estimate SNP parameters and thus help predict the validity of SNPs.  相似文献   

16.
Single nucleotide polymorphisms (SNPs) including insertion/deletions (indels) serve as useful and informative genetic markers. The availability of high-throughput and inexpensive SNP typing systems has increased interest in the development of SNP markers. After fragments of genes were amplified with primers derived from 110 soybean GenBank ESTs, sequencing data of PCR products from 15 soybean genotypes from Korea and the United States were analyzed by SeqScape software to find SNPs. Among 35 gene fragments with at least one SNP among the 15 genotypes, SNPs occurred at a frequency of 1 per 2,038 bp in 16,302 bp of coding sequence and 1 per 191 bp in 16,960 bp of noncoding regions. This corresponds to a nucleotide diversity (theta) of 0.00017 and 0.00186, respectively. Of the 97 SNPs discovered, 78 or 80.4% were present in the six North American soybean mapping parents. The addition of "Hwaeomputkong," which originated from Japan, increased the number to 92, or 94.8% of the total number of SNPs present among the 15 genotypes. Thus, Hwaeomputkong and the six North American mapping parents provide a diverse set of soybean genotypes that can be successfully used for SNP discovery in coding DNA and closely associated introns and untranslated regions.  相似文献   

17.
In this study, microsatellite markers were developed for the genetic linkage mapping and breeding program of the black tiger shrimp Penaeus monodon. A total of 997 unique microsatellite-containing expressed sequence tags (ESTs) were identified from 10 100 EST sequences in the P. monodon EST database. AT-rich microsatellite types were predominant in the EST sequences. Homology searching by the blastn and blastx programs revealed that these 997 ESTs represented 8.6% known gene products, 27.8% hypothetical proteins and 63.6% unknown gene products. Characterization of 50 markers on a panel of 35-48 unrelated shrimp indicated an average number of alleles of 12.6 and an average polymorphic information content of 0.723. These EST microsatellite markers along with 208 other markers (185 amplified fragment length polymorphisms, one exon-primed intron-crossing, six single strand conformation polymorphisms, one single nucleotide polymorphism, 13 non-EST-associated microsatellites and two EST-associated microsatellites) were analysed across the international P. monodon mapping family. A total of 144 new markers were added to the P. monodon maps, including 36 of the microsatellite-containing ESTs. The current P. monodon male and female linkage maps have 47 and 36 linkage groups respectively with coverage across half the P. monodon genome.  相似文献   

18.
Urofacial (Ochoa) syndrome is an autosomal recessive disease characterized by distorted facial expression and urinary abnormalities. Previously, we mapped the UFS gene to chromosome 10q23-q24 and narrowed the interval to one YAC clone of 1410 kb. Here, we have constructed a BAC/PAC contig of the 1-Mb region using STS content mapping with 42 BAC/PAC-end sequences, 9 previously reported and 16 newly identified microsatellite markers, and 14 EST markers. A total of 26 polymorphic microsatellite markers were genotyped for 31 UFS patients from Colombia and 2 patients from the United States. Haplotype analyses suggest that the UFS gene is located within two overlapping BAC clones, a region of <360 kb of DNA sequence. We tested 42 EST markers previously mapped to the D10S1709-D10S603 interval against the BAC/PAC contig and identified 11 ESTs located in the 1-Mb region. Four of the 11 ESTs mapped to the 360-kb UFS critical region. Shotgun sequencing of the two BAC clones and BLASTN search of the EST databases revealed 3 other ESTs contained in the UFS critical region. These results will facilitate the cloning and identification of the UFS gene.  相似文献   

19.
Linkage mapping of gene-associated SNPs to pig chromosome 11   总被引:3,自引:0,他引:3  
Single nucleotide polymorphisms (SNPs) were discovered in porcine expressed sequence tags (ESTs) orthologous to genes from human chromosome 13 (HSA13) and predicted to be located on pig chromosome 11 (SSC11). The SNPs were identified as sequence variants in clusters of EST sequences from pig cDNA libraries constructed in the Sino-Danish pig genome project. In total, 312 human gene sequences from HSA13 were used for similarity searches in our pig EST database. Pig ESTs showing significant similarity with HSA13 genes were clustered and candidate SNPs were identified. Allele frequencies for 26 SNPs were estimated in a group of 80 unrelated pigs from Danish commercial pig breeds: Duroc, Hampshire, Landrace and Large White. Eighteen of the 26 SNPs genotyped in the PiGMaP Reference Families were mapped by linkage analysis to SSC11. The EST-based SNPs published here are new genetic markers useful for linkage and association studies in commercial and experimental pig populations. This study represents the first gene-associated SNP linkage map of pig chromosome 11 and adds new comparative mapping information between SSC11 and HSA13. Furthermore, our data facilitate future studies aimed at the identification of interesting regions on pig chromosome 11, positional cloning and fine mapping of quantitative trait loci in pig.  相似文献   

20.

Background

Homoeologous sequences pose a particular challenge if bacterial artificial chromosome (BAC) contigs shall be established for specific regions of an allopolyploid genome. Single nucleotide polymorphisms (SNPs) differentiating between homoeologous genomes (intergenomic SNPs) may represent a suitable screening tool for such purposes, since they do not only identify homoeologous sequences but also differentiate between them.

Results

Sequence alignments between Brassica rapa (AA) and Brassica oleracea (CC) sequences mapping to corresponding regions on chromosomes A1 and C1, respectively were used to identify single nucleotide polymorphisms between the A and C genomes. A large fraction of these polymorphisms was also present in Brassica napus (AACC), an allopolyploid species that originated from hybridisation of A and C genome species. Intergenomic SNPs mapping throughout homoeologous chromosome segments spanning approximately one Mbp each were included in Illumina’s GoldenGate® Genotyping Assay and used to screen multidimensional pools of a Brassica napus bacterial artificial chromosome library with tenfold genome coverage. Based on the results of 50 SNP assays, a BAC contig for the Brassica napus A subgenome was established that spanned the entire region of interest. The C subgenome region was represented in three BAC contigs.

Conclusions

This proof-of-concept study shows that sequence resources of diploid progenitor genomes can be used to deduce intergenomic SNPs suitable for multiplex polymerase chain reaction (PCR)-based screening of multidimensional BAC pools of a polyploid organism. Owing to their high abundance and ease of identification, intergenomic SNPs represent a versatile tool to establish BAC contigs for homoeologous regions of a polyploid genome.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-560) contains supplementary material, which is available to authorized users.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号