首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Expressed sequence tags (ESTs) have proven to be a valuable tool to discover single nucleotide polymorphism (SNP) in human genes but their use for this purpose is still limited in higher plants. Using a database of approximately 250,000 sugarcane ESTs we have recovered 219 sequences encoding alcohol dehydrogenases ( Adh), which tagged 178 distinct cDNAs from 27 libraries, constructed from at least four different cultivars. The partitioning of these ESTs into paralogous genes revealed three Adh genes expressed in sugarcane, one Adh2 and two Adh1. The soundness of the partition was carefully checked by comparison to external data, especially from the closely related sorghum. Analysis of polymorphism in the alignments of EST sequences revealed a total of 37 highly reliable SNPs in the coding and untranslated regions of the three Adh genes. In the coding regions, the mean occurrence of SNPs was one for every 122 base pair. A total of eight insertion-deletions was observed, their occurrence being limited to untranslated regions. These results show that EST data constitute an invaluable source of sequence polymorphism for sugarcane that is worth carefully collecting for the future development of new marker tools.  相似文献   

2.
As the largest set of sequence variants, single-nucleotide polymorphisms (SNPs) constitute powerful assets for mapping genes and mutations related to common diseases and for pharmacogenetic studies. A major goal in human genetics is to establish a high-density map of the genome containing several hundred thousand SNPs. Here we assayed 3.7 Mb (154,397 bp in 24 alleles) of chromosome 14 expressed sequence tags (ESTs) and sequence-tagged sites, for sequence variation in DNA samples from 12 African individuals. We identified and mapped 480 biallelic markers (459 SNPs and 21 small insertions and deletions), equally distributed between EST and non-EST classes. Extensive research in public databases also yielded 604 chromosome 14 SNPs (dbSNPs), 520 of which could be mapped and 19 of which are common between CNG (i.e., identified at the Centre National de Génotypage) and dbSNP polymorphisms. We present a dense map of SNP variation of human chromosome 14 based on 981 nonredundant biallelic markers present among 1345 radiation hybrid mapped sequence objects. Next, bioinformatic tools allowed 945 significant sequence alignments to chromosome 14 contigs, giving the precise chromosome sequence position for 70% of the mapped sequences and SNPs. In addition, these tools also permitted the identification and mapping of 273 SNPs in 159 known genes. The availability of this SNP map will permit a wide range of genetic studies on a complete chromosome. The recognition of 45 genes with multiple SNPs, by allowing the construction of haplotypes, should facilitate pharmacogenetic studies in the corresponding regions.  相似文献   

3.
4.
FELINES (Finding and Examining Lots of Intron 'N' Exon Sequences) is a utility written to automate construction and analysis of high quality intron and exon sequence databases produced from EST (expressed sequence tag) to genomic sequence alignments. We demonstrated the various programs of the FELINES utility by creating intron and exon sequence databases for the fungal organism Schizosaccharomyces pombe from alignments of EST to genomic sequences. In addition, we analyzed our constructed S.pombe sequence databases and the well-established Saccharomyces cerevisiae intron database from Manuel Ares' Laboratory for conserved sequence motifs. FELINES was shown to be useful for characterizing branchsites, polypyrimidine tracts and 5' and 3' splice sites in the intron databases and exonic splicing enhancers (ESEs) in S.pombe exons. FELINES is available at http://www.genome.ou.edu/informatics.html.  相似文献   

5.
AutoSNP is a program to detect single nucleotide polymorphisms (SNPs) and insertion/deletion polymorphisms (indels) in expressed sequence tag (EST) data. The program uses d2cluster and cap3 to cluster and align EST sequences, and uses redundancy to differentiate between candidate SNPs and sequence errors. Candidate polymorphisms are identified as occurring in multiple reads within an alignment. For each candidate SNP, two measures of confidence are calculated, the redundancy of the polymorphism at a SNP locus and the co segregation of the candidate SNP with other SNPs in the alignment. AVAILABILITY: The program was written in PERL and is freely available to non-commercial users by request from the authors.  相似文献   

6.
Mining single-nucleotide polymorphisms from hexaploid wheat ESTs.   总被引:20,自引:0,他引:20  
Single-nucleotide polymorphisms (SNPs) represent a new form of functional marker, particularly when they are derived from expressed sequence tags (ESTs). A bioinformatics strategy was developed to discover SNPs within a large wheat EST database and to demonstrate the utility of SNPs in genetic mapping and genetic diversity applications. A collection of > 90000 wheat ESTs was assembled into contiguous sequences (contigs), and 45 random contigs were then visually inspected to identify primer pairs capable of amplifying specific alleles. We estimate that homoeologue sequence variants occurred 1 in 24 bp and the frequency of SNPs between wheat genotypes was 1 SNP/540 bp (theta = 0.0069). Furthermore, we estimate that one diagnostic SNP test can be developed from every contig with 10-60 EST members. Thus, EST databases are an abundant source of SNP markers. Polymorphism information content for SNPs ranged from 0.04 to 0.50 and ESTs could be mapped into a framework of microsatellite markers using segregating populations. The results showed that SNPs in wheat can be discovered in ESTs, validated, and be applied to conventional genetic studies.  相似文献   

7.
8.
Single nucleotide polymorphisms (SNPs) are a class of genetic markers that are well suited to a broad range of research and management applications. Although advances in genotyping chemistries and analysis methods continue to increase the potential advantages of using SNPs to address molecular ecological questions, the scarcity of available DNA sequence data for most species has limited marker development. As the number and diversity of species being targeted for large-scale sequencing has increased, so has the potential for using sequence from sister taxa for marker development in species of interest. We evaluated the use of Oncorhynchus mykiss and Salmo salar sequence data to identify SNPs in three other species (Oncorhynchus tshawytscha, Oncorhynchus nerka and Oncorhynchus keta). Primers designed based on O. mykiss and S. salar alignments were more successful than primers designed based on Oncorhynchus-only alignments for sequencing target species, presumably due to the much larger number of potential targets available from the former alignments and possibly greater sequence conservation in those targets. In sequencing approximately 89 kb we observed a frequency of 4.30 x 10(-3) SNPs per base pair. Approximately half (53/101) of the subsequently designed validation assays resulted in high-throughput SNP genotyping markers. We speculate that this relatively low conversion rate may reflect the duplicated nature of the salmon genome. Our results suggest that a large number of SNPs could be developed for Pacific salmon using sequence data from other species. While the costs of DNA sequencing are still significant, these must be compared to the costs of using other marker classes for a given application.  相似文献   

9.
10.
Single nucleotide polymorphisms (SNPs) are useful for characterizing allelic variation, for genome-wide mapping, and as a tool for marker-assisted selection. Discovery of SNPs through de novo sequencing is inefficient within cultivated tomato (Lycopersicon esculentum Mill.) because the polymorphism rate is more than ten-fold lower than the sequencing error rate. The availability of expressed sequence tag (EST) data has made it feasible to discover putative SNPs in silico prior to experimental verification. By exploiting redundancy among EST data available for different varieties among 148,373 tomato ESTs, we have identified candidate SNPs for use within cultivated germplasm pools. 1,245 contigs having three EST sequences of Rio Grande and three EST sequences of TA496 were used for SNP discovery. We detected 1 SNP for every 8,500 bases analyzed, with 101 candidate SNPs in 44 genes identified. Sixty-six SNPs could be recognized by restriction enzymes, and subsequent experimental verification using restriction digestion or CEL I digestion confirmed 83% of the putative polymorphisms tested. SNPs between TA496 and Rio Grande have a high probability (53%) of detecting polymorphisms between other L. esculentum varieties. Twenty-six SNPs in 18 unigenes were mapped to specific chromosomes. Two SNPs, LEOH23 and LEOH37, were shown to be linked to quantitative trait loci contributing to fruit color within elite breeding populations. These results suggest that the growing databases of DNA sequence will yield information that facilitates improvement within the germplasm pools that have contributed to productive modern varieties.  相似文献   

11.
12.
SNP(single nucleotide polymorphism,单核苷酸多态)在猪基因组中的分布极其广泛,平均分布间隔为300~400 bp,相关数据库收录已达55万条。猪基因组测序已取得实质性进展,大规模搜索发现基因组及EST(expressed sequence tag)序列中的SNP已展开,应用于猪全基因组水平的SNP芯片已建立。在此基础上,基于猪SNP标记的遗传图谱绘制、QTL(quantitative trait loci)定位、遗传多样性检测及全基因组关联分析等也都相继出现。  相似文献   

13.
The public EST (expressed sequence tag) databases represent an enormous but heterogeneous repository of sequences, including many from a broad selection of plant species and a wide range of distinct varieties. The significant redundancy within large EST collections makes them an attractive resource for rapid pre-selection of candidate sequence polymorphisms. Here we present a strategy that allows rapid identification of candidate SNPs in barley (Hordeum vulgare L.) using publicly available EST databases. Analysis of 271,630 EST sequences from different cDNA libraries, representing 23 different barley varieties, resulted in the generation of 56,302 tentative consensus sequences. In all, 8171 of these unigene sequences are members of clusters with six or more ESTs. By applying a novel SNP detection algorithm (SNiPpER) to these sequences, we identified 3069 candidate inter-varietal SNPs. In order to verify these candidate SNPs, we selected a small subset of 63 present in 36 ESTs. Of the 63 SNPs selected, we were able to validate 54 (86%) using a direct sequencing approach. For further verification, 28 ESTs were mapped to distinct loci within the barley genome. The polymorphism information content (PIC) and nucleotide diversity () values of the SNPs identified by the SNiPpER algorithm are significantly higher than those that were obtained by random sequencing. This demonstrates the efficiency of our strategy for SNP identification and the cost-efficient development of EST-based SNP-markers.The first two authors contributed equally to this work  相似文献   

14.
A system to use bovine EST data in conjunction with human genomic sequence to improve the bovine linkage map over the entire genome or on specific chromosomes was evaluated. Bovine EST sequence was used to provide primer sequences corresponding to bovine genes, while human genomic sequence directed primer design to flank introns and produce amplicons of appropriate size for efficient direct sequencing. The sequence tagged sites (STS) produced in this way from the four sires of the MARC reference families were examined for single nucleotide polymorphisms (SNPs) that could be used to map the corresponding genes. With this approach, along with a primer/extension mass spectrometry SNP genotyping assay, 100 ESTs were placed on the bovine genetic linkage map. The first 70 were chosen at random from bovine EST–human genomic comparisons. An additional 30 ESTs were successfully mapped to bovine Chromosome 19 (BTA19), and comparison of the resulting BTA19 map to the position of the corresponding human orthologs on the HSA17 draft sequences revealed differences in the spacing and order of genes. Over 80% of successful amplicons contained SNPs, indicating that this is an efficient approach to generating EST-associated genetic markers. We have demonstrated the feasibility of constructing a linkage map based on SNPs associated with ESTs and the plausibility of utilizing EST, comparative mapping information, and human sequence data to target regions of the bovine genome for SNP marker development.  相似文献   

15.
MOTIVATION: Consensus sequence generation is important in many kinds of sequence analysis ranging from sequence assembly to profile-based iterative search methods. However, how can a consensus be constructed when its inherent assumption-that the aligned sequences form a single linear consensus-is not true? RESULTS: Partial Order Alignment (POA) enables construction and analysis of multiple sequence alignments as directed acyclic graphs containing complex branching structure. Here we present a dynamic programming algorithm (heaviest_bundle) for generating multiple consensus sequences from such complex alignments. The number and relationships of these consensus sequences reveals the degree of structural complexity of the source alignment. This is a powerful and general approach for analyzing and visualizing complex alignment structures, and can be applied to any alignment. We illustrate its value for analyzing expressed sequence alignments to detect alternative splicing, reconstruct full length mRNA isoform sequences from EST fragments, and separate paralog mixtures that can cause incorrect SNP predictions. AVAILABILITY: The heaviest_bundle source code is available at http://www.bioinformatics.ucla.edu/poa  相似文献   

16.
MicroRNAs (miRNAs) are a class of noncoding small RNAs that regulate gene expression by base pairing with target mRNAs at the 3'-terminal untranslated regions (3'-UTRs), leading to mRNA cleavage or translational repression. Single-nucleotide polymorphisms (SNPs) located at miRNA-binding sites (miRNA-binding SNPs) are likely to affect the expression of the miRNA target and may contribute to the susceptibility of humans to common diseases. We herein performed a genome-wide analysis of SNPs located in the miRNA-binding sites of the 3'-UTR of various human genes. We found that miRNA-binding SNPs are negatively selected in respect to SNP distribution between the miRNA-binding 'seed' sequence and the entire 3'-UTR sequence. Furthermore, we comprehensively defined the expression of each miRNA-binding SNP in cancers versus normal tissues through mining EST databases. Interestingly, we found that some miRNA-binding SNPs exhibit significant different allele frequencies between the human cancer EST libraries and the dbSNP database. More importantly, using human cancer specimens against the dbSNP database for case-control association studies, we found that twelve miRNA-binding SNPs indeed display an aberrant allele frequency in human cancers. Hence, SNPs located in miRNA-binding sites affect miRNA target expression and function, and are potentially associated with cancers.  相似文献   

17.
18.
19.
Single nucleotide polymorphisms (SNPs) represent the most abundant type of genetic variation that can be used as molecular markers. The SNPs that are hidden in sequence databases can be unlocked using bioinformatic tools. For efficient application of these SNPs, the sequence set should be error-free as much as possible, targeting single loci and suitable for the SNP scoring platform of choice. We have developed a pipeline to effectively mine SNPs from public EST databases with or without quality information using QualitySNP software, select reliable SNP and prepare the loci for analysis on the Illumina GoldenGate genotyping platform. The applicability of the pipeline was demonstrated using publicly available potato EST data, genotyping individuals from two diploid mapping populations and subsequently mapping the SNP markers (putative genes) in both populations. Over 7000 reliable SNPs were identified that met the criteria for genotyping on the GoldenGate platform. Of the 384 SNPs on the SNP array approximately 12% dropped out. For the two potato mapping populations 165 and 185 SNPs segregating SNP loci could be mapped on the respective genetic maps, illustrating the effectiveness of our pipeline for SNP selection and validation.  相似文献   

20.
Expressed sequence tags (ESTs) currently encompass more entries in the public databases than any other form of sequence data. Thus, EST data sets provide a vast resource for gene identification and expression profiling. We have mapped the complete set of 176,915 publicly available Arabidopsis EST sequences onto the Arabidopsis genome using GeneSeqer, a spliced alignment program incorporating sequence similarity and splice site scoring. About 96% of the available ESTs could be properly aligned with a genomic locus, with the remaining ESTs deriving from organelle genomes and non-Arabidopsis sources or displaying insufficient sequence quality for alignment. The mapping provides verified sets of EST clusters for evaluation of EST clustering programs. Analysis of the spliced alignments suggests corrections to current gene structure annotation and provides examples of alternative and non-canonical pre-mRNA splicing. All results of this study were parsed into a database and are accessible via a flexible Web interface at http://www.plantgdb.org/AtGDB/.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号