首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
We made use of 81,635 expressed sequence tags (ESTs) derived from 12 different cDNA libraries of the silkworm, Bombyx mori, inbred strain Dazao (P50), to identify high-quality candidate single nucleotide polymorphisms (SNPs). By PHRAP assembling, 12,980 contigs containing 11,537 contigs assembled by more than one read were obtained, and 101 candidate SNPs and 27 single base insertions/deletions were identified from 117 contigs assembled from 1576 high-quality reads base-called with PHRED and screened on the basis of the neighborhood quality standard (NQS). Simultaneously, we also predicted 40 SNPs in coding regions (cSNPs), of which 26 were predicted to lead to amino acid non-synonymous variations and 14 synonymous substitutions. Also, the 1.66:1 ratio of transition/transversion is different from that of other insects. As the first SNP analysis of a Lepidoptera, B. mori, the single nucleotide polymorphic density is estimated to be 1.3 x 10(-3) by sequence diversity. This analysis shows that expressed sequences from multiple libraries may provide an abundant source of comparative reads to mine for cSNPs from the silkworm genome.  相似文献   

2.
We developed an automated pipeline for the detection of single nucleotide polymorphisms (SNPs) in expressed sequence tag (EST) data sets, by combining three DNA sequence analysis programs: Phred, Phrap and PolyBayes. This application requires access to the individual electrophoregram traces. First, a reference set of 65 SNPs was obtained from the sequencing of 30 gametes in 13 maritime pine (Pinus pinaster Ait.) gene fragments (6671 bp), resulting in a frequency of 1 SNP every 102.6 bp. Second, parameters of the three programs were optimized in order to retrieve as many true SNPs, while keeping the rate of false positive as low as possible. Overall, the efficiency of detection of true SNPs was 83.1%. However, this rate varied largely as a function of the rare SNP allele frequency: down to 41% for rare SNP alleles (frequency < 10%), up to 98% for allele frequencies above 10%. Third, the detection method was applied to the 18498 assembled maritime pine (Pinus pinaster Ait.) ESTs, allowing to identify a total of 1400 candidate SNPs, in contigs containing between 4 and 20 sequence reads. These genetic resources, described for the first time in a forest tree species, were made available at http://www.pierroton.inra/genetics/Pinesnps. We also derived an analytical expression for the SNP detection probability as a function of the SNP allele frequency, the number of haploid genomes used to generate the EST sequence database, and the sample size of the contigs considered for SNP detection. The frequency of the SNP allele was shown to be the main factor influencing the probability of SNP detection.  相似文献   

3.
Mining single-nucleotide polymorphisms from hexaploid wheat ESTs.   总被引:20,自引:0,他引:20  
Single-nucleotide polymorphisms (SNPs) represent a new form of functional marker, particularly when they are derived from expressed sequence tags (ESTs). A bioinformatics strategy was developed to discover SNPs within a large wheat EST database and to demonstrate the utility of SNPs in genetic mapping and genetic diversity applications. A collection of > 90000 wheat ESTs was assembled into contiguous sequences (contigs), and 45 random contigs were then visually inspected to identify primer pairs capable of amplifying specific alleles. We estimate that homoeologue sequence variants occurred 1 in 24 bp and the frequency of SNPs between wheat genotypes was 1 SNP/540 bp (theta = 0.0069). Furthermore, we estimate that one diagnostic SNP test can be developed from every contig with 10-60 EST members. Thus, EST databases are an abundant source of SNP markers. Polymorphism information content for SNPs ranged from 0.04 to 0.50 and ESTs could be mapped into a framework of microsatellite markers using segregating populations. The results showed that SNPs in wheat can be discovered in ESTs, validated, and be applied to conventional genetic studies.  相似文献   

4.
Over 16,000 high quality expressed sequence tags (ESTs) from red junglefowl (RJ) and White Leghorn (WL) brain and testis cDNA libraries were generated. Here, we have used this resource for detection of single nucleotide polymorphisms (SNPs), and also completed full-length sequencing of 46 pairs of clones, representing the same gene from both the RJ and WL libraries. From the main set of ESTs, which were assembled using Phrap, 746 putative SNPs were identified, of which 76% were transitions and 24% were transversions. A subset of SNPs was evaluated by sequence analysis of five RJ and five WL birds. Nine of 12 SNPs were verified in this limited sample, suggesting that a majority of the putative polymorphisms documented in this study represent real SNPs. During full-length sequencing of the 46 RJ/WL clones 100 SNPs were identified, which translated to a frequency of 1.90 SNPs/1000 bp. The number of transitions and transversions were 77% and 23%, respectively, and the proportion of non-synonymous vs. synonymous SNPs was 20% and 80%, respectively. Four large insertions/deletions were identified between the RJ and WL full-length sequences, and they appear to represent different splice variants.  相似文献   

5.
By using assembled expressed sequence tags (ESTs) from 14 different eDNA libraries that contain 84 132 sequences reads, 556 Populus candidate single nucleotide polymorphisms (SNPs) were identified. Because traces were not available from dbEST (http://www.ncbi.nlm.nih.gov/dbEST/index.html), stringent filters were used to identify reliable candidate SNPs. Sequences analysis indicated that the main types of substitutions among candidate SNPs were A/G and T/C transitions, which accounted for 22.0% and 30.8%, respectively. One hundred and ten candidate SNPs were tested. As a result, 38 candidate SNPs were confirmed by directed sequencing of PCR products amplified from six different individuals. Thirteen new SNPs in intron regions were found and multiple SNPs were found to be located in both intron and exon regions of four contigs. Heterozygosis was found in all 47 candidate sites and five SNP sites were heterozygous in all six samples. This is the first report of SNP identification in a tree species which reveals that assembled ESTs from multiple libraries of the public database may provide a rich source of comparative sequences for an SNP search in the poplar genome.  相似文献   

6.
The public EST (expressed sequence tag) databases represent an enormous but heterogeneous repository of sequences, including many from a broad selection of plant species and a wide range of distinct varieties. The significant redundancy within large EST collections makes them an attractive resource for rapid pre-selection of candidate sequence polymorphisms. Here we present a strategy that allows rapid identification of candidate SNPs in barley (Hordeum vulgare L.) using publicly available EST databases. Analysis of 271,630 EST sequences from different cDNA libraries, representing 23 different barley varieties, resulted in the generation of 56,302 tentative consensus sequences. In all, 8171 of these unigene sequences are members of clusters with six or more ESTs. By applying a novel SNP detection algorithm (SNiPpER) to these sequences, we identified 3069 candidate inter-varietal SNPs. In order to verify these candidate SNPs, we selected a small subset of 63 present in 36 ESTs. Of the 63 SNPs selected, we were able to validate 54 (86%) using a direct sequencing approach. For further verification, 28 ESTs were mapped to distinct loci within the barley genome. The polymorphism information content (PIC) and nucleotide diversity () values of the SNPs identified by the SNiPpER algorithm are significantly higher than those that were obtained by random sequencing. This demonstrates the efficiency of our strategy for SNP identification and the cost-efficient development of EST-based SNP-markers.The first two authors contributed equally to this work  相似文献   

7.
8.
9.
To identify new vaccine candidates, Eimeria tenella expressed sequence tags (ESTs) from public databases were analysed for secretory molecules with an especially developed automated in silico strategy termed DNAsignalP. A total of 12,187 ESTs were clustered into 2881 contigs followed by a blastx search, which resulted in a significant number of E. tenella contigs with homologies to entries in public databases. Amino acid sequences of appropriate homologous proteins were analysed for the occurrence of an N-terminal signal sequence using the algorithm signalP. The resulting list of 84 entries comprised 51 contigs whose deduced proteins showed homologies to proteins of apicomplexan parasites. Based on function or localisation, we selected candidate proteins classified as (i) secreted proteins of Apicomplexa parasites, (ii) secreted enzymes, and (iii) transport and signalling proteins. To verify our strategy experimentally, we used a functional complementation system in yeast. For five selected candidate proteins we found that these were indeed secreted. Our approach thus represents an efficient method to identify secretory and surface proteins out of EST databases.  相似文献   

10.
Single nucleotide polymorphisms (SNPs) are useful for characterizing allelic variation, for genome-wide mapping, and as a tool for marker-assisted selection. Discovery of SNPs through de novo sequencing is inefficient within cultivated tomato (Lycopersicon esculentum Mill.) because the polymorphism rate is more than ten-fold lower than the sequencing error rate. The availability of expressed sequence tag (EST) data has made it feasible to discover putative SNPs in silico prior to experimental verification. By exploiting redundancy among EST data available for different varieties among 148,373 tomato ESTs, we have identified candidate SNPs for use within cultivated germplasm pools. 1,245 contigs having three EST sequences of Rio Grande and three EST sequences of TA496 were used for SNP discovery. We detected 1 SNP for every 8,500 bases analyzed, with 101 candidate SNPs in 44 genes identified. Sixty-six SNPs could be recognized by restriction enzymes, and subsequent experimental verification using restriction digestion or CEL I digestion confirmed 83% of the putative polymorphisms tested. SNPs between TA496 and Rio Grande have a high probability (53%) of detecting polymorphisms between other L. esculentum varieties. Twenty-six SNPs in 18 unigenes were mapped to specific chromosomes. Two SNPs, LEOH23 and LEOH37, were shown to be linked to quantitative trait loci contributing to fruit color within elite breeding populations. These results suggest that the growing databases of DNA sequence will yield information that facilitates improvement within the germplasm pools that have contributed to productive modern varieties.  相似文献   

11.
Linkage mapping of gene-associated SNPs to pig chromosome 11   总被引:3,自引:0,他引:3  
Single nucleotide polymorphisms (SNPs) were discovered in porcine expressed sequence tags (ESTs) orthologous to genes from human chromosome 13 (HSA13) and predicted to be located on pig chromosome 11 (SSC11). The SNPs were identified as sequence variants in clusters of EST sequences from pig cDNA libraries constructed in the Sino-Danish pig genome project. In total, 312 human gene sequences from HSA13 were used for similarity searches in our pig EST database. Pig ESTs showing significant similarity with HSA13 genes were clustered and candidate SNPs were identified. Allele frequencies for 26 SNPs were estimated in a group of 80 unrelated pigs from Danish commercial pig breeds: Duroc, Hampshire, Landrace and Large White. Eighteen of the 26 SNPs genotyped in the PiGMaP Reference Families were mapped by linkage analysis to SSC11. The EST-based SNPs published here are new genetic markers useful for linkage and association studies in commercial and experimental pig populations. This study represents the first gene-associated SNP linkage map of pig chromosome 11 and adds new comparative mapping information between SSC11 and HSA13. Furthermore, our data facilitate future studies aimed at the identification of interesting regions on pig chromosome 11, positional cloning and fine mapping of quantitative trait loci in pig.  相似文献   

12.
The US Wheat Genome Project, funded by the National Science Foundation, developed the first large public Triticeae expressed sequence tag (EST) resource. Altogether, 116,272 ESTs were produced, comprising 100,674 5' ESTs and 15 598 3' ESTs. These ESTs were derived from 42 cDNA libraries, which were created from hexaploid bread wheat (Triticum aestivum L.) and its close relatives, including diploid wheat (T. monococcum L. and Aegilops speltoides L.), tetraploid wheat (T. turgidum L.), and rye (Secale cereale L.), using tissues collected from various stages of plant growth and development and under diverse regimes of abiotic and biotic stress treatments. ESTs were assembled into 18,876 contigs and 23,034 singletons, or 41,910 wheat unigenes. Over 90% of the contigs contained fewer than 10 EST members, implying that the ESTs represented a diverse selection of genes and that genes expressed at low and moderate to high levels were well sampled. Statistical methods were used to study the correlation of gene expression patterns, based on the ESTs clustered in the 1536 contigs that contained at least 10 5' EST members and thus representing the most abundant genes expressed in wheat. Analysis further identified genes in wheat that were significantly upregulated (p < 0.05) in tissues under various abiotic stresses when compared with control tissues. Though the function annotation cannot be assigned for many of these genes, it is likely that they play a role associated with the stress response. This study predicted the possible functionality for 4% of total wheat unigenes, which leaves the remaining 96% with their functional roles and expression patterns largely unknown. Nonetheless, the EST data generated in this project provide a diverse and rich source for gene discovery in wheat.  相似文献   

13.
Single nucleotide polymorphisms (SNP) are the most abundant type of DNA polymorphism found in animal and plant genomes. They provide an important new source of molecular markers that are useful in genetic mapping, map-based positional cloning, quantitative trait locus mapping and the assessment of genetic distances between individuals. Very little is known on the frequency of SNPs in cassava. We have exploited the recently-developed collection of cassava expressed sequence tags (ESTs) to detect SNPs in the five cultivars of cassava used to generate the sequences. The frequency of intra-cultivar and inter-cultivar SNPs after analysis of 111 contigs was one polymorphism per 905 and one per 1,032 bp, respectively; totaling 1 each 509 bp. We have obtained further information on the frequency of SNPs in six cassava cultivars by analysis of 33 amplicons obtained from 3 EST and BAC end sequences. Overall, about 11 kb of DNA sequence was obtained for each cultivar. A total of 186 SNPs (136 and 50 from ESTs and BAC ends, respectively) were identified. Among these, 146 were intra-cultivar polymorphisms, while 80 were inter-cultivar polymorphisms. Thus the total frequency of SNPs was one per 62 bp. This information will help to develop new strategies for EST mapping as well as their association with phenotypic characteristics.  相似文献   

14.
We performed random sequencing of cDNAs from nine biologically or industrially important cultures of the industrially valuable fungus Aspergillus oryzae to obtain expressed sequence tags (ESTs). Consequently, 21 446 raw ESTs were accumulated and subsequently assembled to 7589 non-redundant consensus sequences (contigs). Among all contigs, 5491 (72.4%) were derived from only a particular culture. These included 4735 (62.4%) singletons, i.e. lone ESTs overlapping with no others. These data showed that consideration of culture grown under various conditions as cDNA sources enabled efficient collection of ESTs. BLAST searches against the public databases showed that 2953 (38.9%) of the EST contigs showed significant similarities to deposited sequences with known functions, 793 (10.5%) were similar to hypothetical proteins, and the remaining 3843 (50.6%) showed no significant similarity to sequences in the databases. Culture-specific contigs were extracted on the basis of the EST frequency normalized by the total number for each culture condition. In addition, contig sequences were compared with sequence sets in eukaryotic orthologous groups (KOGs), and classified into the KOG functional categories.  相似文献   

15.
Two non-normalized cDNA libraries of uteri from Danish Landrace and Chinese Erhualian pigs were constructed, and 13,756 expressed sequence tags (ESTs) were randomly sequenced. The ESTs were clustered by Phrap software, and 6,139 distinct tentative consensus sequences were produced, including 2,730 contigs and 3,409 singlets. Using Blast tools, these 6,139 candidate genes were compared to the nr and nt databases; 5,210 of them were assigned putative functions, whereas 929 potentially represent new genes. Highly expressed genes appear to be associated with basic energy metabolism, transferase activity, localization, cellular physiological process, protein binding, and nucleic acid binding. Antileukoproteinase was the most highly expressed gene, corresponding to endometrial differentiation and conceptus or fetal development.  相似文献   

16.
17.
Kim JY  Park HS  Lim D  Jang HC  Park HS  Lee KT  Kim JS  Oh SI  Kweon MS  Kim TH  Choi BH 《BMB reports》2011,44(4):238-243
We generated 16,993 expressed sequence tags (ESTs) from two libraries containing full-length cDNAs from the brain and liver of the Korean Jindo dog. An additional 365,909 ESTs from other dog breeds were identified from the NCBI dbEST database, and all ESTs were clustered into 28,514 consensus sequences using StackPack. We selected the 7,305 consensus sequences that could be assembled from at least five ESTs and estimated that 12,533 high-quality single nucleotide polymorphisms (SNPs) were present in 97,835 putative SNPs from the 7,305 consensus sequences. We identified 58 Jindo dog-specific SNPs in comparison to other breeds and predicted seven synonymous SNPs and ten non-synonymous SNPs. Using PolyPhen, a program that predicts changes in protein structure and potential effects on protein function caused by amino acid substitutions, three of the non-synonymous SNPs were predicted to result in changes in protein function for proteins expressed by three different genes (TUSC3, ITIH2, and NAT2).  相似文献   

18.
Single nucleotide polymorphisms in cytochrome P450 genes from barley   总被引:12,自引:0,他引:12  
Plant cytochrome P450s are known to be essential in a number of economically important pathways of plant metabolism but there are also many P450s of unknown function accumulating in expressed sequence tag (EST) and genomic databases. To detect trait associations that could assist in the assignment of gene function and provide markers for breeders selecting for commercially important traits, detection of polymorphisms in identified P450 genes is desirable. Polymorphisms in EST sequences provide so-called perfect markers for the associated genes. The International Triticeae EST Cooperative data base of 24,344 ESTs was searched for sequences exhibiting homology to P450 genes representing the nine known clans of plant P450s. Seventy five P450 ESTs were identified of which 24 had best matches in Genbank to P450 genes of known function and 51 to P450s of unknown function. Sequence information from PCR products amplified from the genomic template DNA of 11 barley varieties was obtained using primers designed from six barley P450 ESTs and one durum wheat P450 EST. Single nucleotide polymorphisms (SNPs) between barley varieties were identified using five of the seven PCR products. A maximum of five SNPs and three haplotypes among the 11 barley lines were detected in products from any one primer pair. SNPs in three PCR products led to changes between barley varieties in at least one restriction site enabling genotyping and mapping without the expense of a specialist SNP detection system. The overall frequency of SNPs across the 11 barley varieties was 1 every 131 bases.  相似文献   

19.
Tomato SNP Discovery by EST Mining and Resequencing   总被引:6,自引:0,他引:6  
Many economically important crop species are relatively depauparate in genetic diversity (e.g., soybean, peanut, tomato). DNA polymorphism within cultivated tomato has been estimated to be low based on molecular markers. Through mining of more than 148,000 public tomato expressed sequence tags (ESTs) and full-length cDNAs, we identified 764 EST clusters with potential single nucleotide polymorphisms (SNPs) among more than 15 tomato lines. By sequencing regions from 53 of these clusters in two to three lines, we discovered a wealth of nucleotide polymorphism (62 SNPs and 12 indels in 21 Unigenes), resulting in a verification rate of 27.2% (28 of 103 SNPs predicted in EST clusters were verified). We hypothesize that five regions with 1.6–13-fold more diversity relative to other tested regions are associated with introgressions from wild relatives. Identifying polymorphic, expressed genes in the tomato genome will be useful for both tomato improvement and germplasm conservation.  相似文献   

20.
L D Chaves  J A Rowe  K M Reed 《Génome》2005,48(1):12-17
Genome characterization and analysis is an imperative step in identifying and selectively breeding for improved traits of agriculturally important species. Expressed sequence tags (ESTs) represent a transcribed portion of the genome and are an effective way to identify genes within a species. Downstream applications of EST projects include DNA microarray construction and interspecies comparisons. In this study, 694 ESTs were sequenced and analyzed from a library derived from a 24-day-old turkey embryo. The 437 unique sequences identified were divided into 76 assembled contigs and 361 singletons. The majority of significant comparative matches occurred between the turkey sequences and sequences reported from the chicken. Whole genome sequence from the chicken was used to identify potential exon-intron boundaries for selected turkey clones and intron-amplifying primers were developed for sequence analysis and single nucleotide polymorphism (SNP) discovery. Identified SNPs were genotyped for linkage analysis on two turkey reference populations. This study significantly increases the number of EST sequences available for the turkey.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号