首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
With the advent of high-throughput sequencing technology, sequences from many genomes are being deposited to public databases at a brisk rate. Open access to large amount of expressed sequence tag (EST) data in the public databases has provided a powerful platform for simple sequence repeat (SSR) development in species where sequence information is not available. SSRs are markers of choice for their high reproducibility, abundant polymorphism and high inter-specific transferability. The mining of SSRs from ESTs requires different high-throughput computational tools that need to be executed individually which are computationally intensive and time consuming. To reduce the time lag and to streamline the cumbersome process of SSR mining from ESTs, we have developed a user-friendly, web-based EST-SSR pipeline "EST-SSR-MARKER PIPELINE (ESMP)". This pipeline integrates EST pre-processing, clustering, assembly and subsequently mining of SSRs from assembled EST sequences. The mining of SSRs from ESTs provides valuable information on the abundance of SSRs in ESTs and will facilitate the development of markers for genetic analysis and related applications such as marker-assisted breeding. AVAILABILITY: The database is available for free at http://bioinfo.aau.ac.in/ESMP.  相似文献   

2.
The germplasm of the genus Nicotiana contains more than 5,000 accessions and plays an important role in modern biological research. Tobacco can be used as a model system to develop methodologies for plant transformation and for investigating gene function. In order to develop the study of Nicotiana, a large quantity of data on germplasm, sequences, molecular markers and genetically modified tobacco was required for in-depth and systematic collation and research. It became necessary to establish a special database for tobacco genetics and breeding. The tobacco genetics and breeding (TGB, http://yancao.sdau.edu.cn/tgb) database was developed with the aim of bringing together tobacco genetics and breeding. The database has three main features: (1) a materials database with information on 1,472 Nicotiana germplasm accessions, as well as updated genomic and expressed sequence tag (EST) data available from the public database; (2) a molecular markers database containing a total of 12,388 potential intron polymorphisms 10,551 EST-simple sequence repeat (EST-SSR) and 66,297 genomic-SSR markers; and (3) an applications database with genetic maps and some genetically modified studies in tobacco. The TGB database also makes Basic Local Alignment Search Tool and primer designing tools publicly available. As far as can be ascertained, the TGB database is the first tobacco genetics and breeding database to be created, and all this comprehensive information will aid basic research into Nicotiana and other related plants. It will serve as an excellent resource for the online tobacco research community.  相似文献   

3.
The advent of large-scale DNA sequencing technology has generated a tremendous amount of sequence information for many important organisms. We have developed a rapid and efficient PCR-based technique, which uses bioinformatics tools and expressed sequence tag (EST) database information to generate polymorphic markers around targeted candidate gene sequences. This target region amplification polymorphism (TRAP) technique uses 2 primers of 18 nucleotides to generate markers. One of the primers, the fixed primer, is designed from the targeted EST sequence in the database; the second primer, the arbitrary primer, is an arbitrary sequence with either an AT-or GC-rich core to anneal with an intron or exon, respectively. PCR amplification is run for the first 5 cycles with an annealing temperature of 35°C, followed by 35 cycles with an annealing temperature of 50°C. For different plant species, each PCR reaction can generate as many as 50 scorable fragments with sizes ranging from 50–900 bp when separated on a 6.5% polyacrylamide sequencing gel. The TRAP technique should be useful in genotyping germplasm collections and in tagging genes governing desirable agronomic traits of crop plants.  相似文献   

4.
Xin D  Sun J  Wang J  Jiang H  Hu G  Liu C  Chen Q 《Molecular biology reports》2012,39(9):9047-9057
Microsatellites, or simple sequence repeats (SSRs), are very useful molecular markers for a number of plant species. We used a new publicly available module (TROLL) to extract microsatellites from the public database of soybean expressed sequence tag (EST) sequences. A total of 12,833 sequences containing di- to penta-type SSRs were identified from 200,516 non-redundant soybean ESTs. On average, one SSR was found per 7.25?kb of EST sequences, with the tri-nucleotide motifs being the most abundant. Primer sequences flanking the SSR motifs were successfully designed for 9,638 soybean ESTs using the software primer3.0 and only 59 pairs of them were found in earlier studies. We synthesized 124 pairs of the primers to determine the polymorphism and heterozygosity among eight genotypes of soybean cultivars, which represented a wide range of the cultivated soybean cultivars. PCR amplification products with anticipated SSRs were obtained with 81 pairs of primers; 36 PCR products appeared to be homozygous and the remaining 45 PCR products appeared to be heterozygous and displayed polymorphism among the eight cultivars. We further analysed the EST sequences containing 45 polymorphic EST-SSR markers using the programs BLASTN and BLASTX. Sequence alignment showed that 29 ESTs have homologous sequences and 15 ESTs could be classified into a Uni-gene cluster with comparatively convincing protein products. Among these 15 ESTs belonging to a Uni-gene cluster, 9 SSRs were located in 3'-UTR, 4 SSRs were located in the intron region and 2 SSRs were located in the CDS region. None of these SSRs was located in the 5'-UTR. These novel SSRs identified in the ESTs of soybean provide useful information for gene mapping and cloning in future studies.  相似文献   

5.
Development and annotation of perennial Triticeae ESTs and SSR markers   总被引:2,自引:0,他引:2  
Triticeae contains hundreds of species of both annual and perennial types. Although substantial genomic tools are available for annual Triticeae cereals such as wheat and barley, the perennial Triticeae lack sufficient genomic resources for genetic mapping or diversity research. To increase the amount of sequence information available in the perennial Triticeae, three expressed sequence tag (EST) libraries were developed and annotated for Pseudoroegneria spicata, a mixture of both Elymus wawawaiensis and E. lanceolatus, and a Leymus cinereus x L. triticoides interspecific hybrid. The ESTs were combined into unigene sets of 8 780 unigenes for P. spicata, 11 281 unigenes for Leymus, and 7 212 unigenes for Elymus. Unigenes were annotated based on putative orthology to genes from rice, wheat, barley, other Poaceae, Arabidopsis, and the non-redundant database of the NCBI. Simple sequence repeat (SSR) markers were developed, tested for amplification and polymorphism, and aligned to the rice genome. Leymus EST markers homologous to rice chromosome 2 genes were syntenous on Leymus homeologous groups 6a and 6b (previously 1b), demonstrating promise for in silico comparative mapping. All ESTs and SSR markers are available on an EST information management and annotation database (http://titan.biotec.uiuc.edu/triticeae/).  相似文献   

6.
Expressed sequence tag (EST) sequences available in the public databases provide a cost-effective and valuable genomic resource for the development of molecular markers. Introns which are non-coding DNA sequences of the gene could be used as potential molecular markers as they are highly variable compared to the coding sequences. This study reports the development of intron length polymorphism markers in cowpea [Vigna unguiculata (L.) Walp.]. The ESTs of cowpea were aligned with genomic sequences of Arabidopsis and soybean to predict the position and number of introns in cowpea. Of the 110 PCR primer pairs designed to amplify the intronic regions, 98 primer pairs resulted in successful amplification and were identified as cowpea intron length polymorphism (CILP) markers. Out of the 45 randomly selected CILP markers, 36?% markers produced length variation in the ten cowpea genotypes, collectively yielding 33 alleles with an average of 2.0 alleles/locus. The polymorphism information content of the CILP markers ranged from 0.18 to 0.64 with an average of 0.34. Of the 98 CILP markers, 93 markers (95?%) showed transferability to other Vigna species. Dendrograms based on CILP markers clearly distinguished the cowpea genotypes as well as other Vigna species, demonstrating the utility of CILP markers in genetic diversity and phylogenetic studies. These CILP markers will be very useful in the genome analysis and marker-assisted breeding of cowpea and other Vigna species.  相似文献   

7.
Microsatellites are the markers of choice due to their high abundance reproducibility, degree of polymorphism and co-dominant nature. These are mainly used for studying the genetic variability in different species and Marker assisted selection. Expressed Sequence Tags (ESTs) serve as the main resource for Simple Sequence Repeats (SSRs). The computational approach for detecting SSRs and developing SSR markers from EST-SSRs is preferred over the conventional methods as it reduces time and cost to a great extent. The available EST sequence databases, various web interfaces and standalone tools provide the platform for an easy analysis of the EST sequences leading to the development of potential EST-SSR Markers. This paper is an overview of in silico approach to develop SSR Markers from the EST sequence using some of the most efficient tools that are available freely for academic purpose.  相似文献   

8.
FELINES (Finding and Examining Lots of Intron 'N' Exon Sequences) is a utility written to automate construction and analysis of high quality intron and exon sequence databases produced from EST (expressed sequence tag) to genomic sequence alignments. We demonstrated the various programs of the FELINES utility by creating intron and exon sequence databases for the fungal organism Schizosaccharomyces pombe from alignments of EST to genomic sequences. In addition, we analyzed our constructed S.pombe sequence databases and the well-established Saccharomyces cerevisiae intron database from Manuel Ares' Laboratory for conserved sequence motifs. FELINES was shown to be useful for characterizing branchsites, polypyrimidine tracts and 5' and 3' splice sites in the intron databases and exonic splicing enhancers (ESEs) in S.pombe exons. FELINES is available at http://www.genome.ou.edu/informatics.html.  相似文献   

9.
Expressed sequence tag (EST) databases represent a potentially valuable resource for the development of molecular markers for use in evolutionary studies. Because EST-derived markers come from transcribed regions of the genome, they are likely to be conserved across a broader taxonomic range than are other sorts of markers. This paper describes a case study in which the publicly available cultivated sunflower (Helianthus annuus) EST database was used to develop simple sequence repeat (SSR) markers for use in the genetic analysis of a rare sunflower species, Helianthus verticillatus, as well as the more widespread Helianthus angustifolius. EST-derived SSRs were found to be more than 3 times as transferable across species as compared with anonymous SSRs (73% vs. 21%, respectively). Moreover, EST-SSRs whose primers were located within protein-coding sequence were more readily transferable than those derived from untranslated regions, and the former loci were no less variable than the latter. The utility of existing EST databases as a means for facilitating population genetic analyses in plants was further explored by cross-referencing publicly available EST resources against available lists of rare or invasive flowering plant taxa. This survey revealed that more than one-third of all plant-derived EST collections of sufficient size could conceivably serve as a source of EST-SSRs for the analysis of rare, endangered, or invasive plant species worldwide.  相似文献   

10.
With a long-term goal of constructing a linkage map of Rhododendron enriched with gene-specific markers, we utilized Rhododendron catawbiense ESTs for the development of high-efficiency (in terms of generating polymorphism frequency) PCR-based markers. Using the gene-sequence alignment between Rhododendron ESTs and the genomic sequences of Arabidopsis homologs, we developed ‘intron-flanking‘ EST–PCR-based primers that would anneal in conserved exon regions and amplify across the more highly diverged introns. These primers resulted in increased efficiency (61% vs. 13%; 4.7-fold) of polymorphism-detection compared with conventional EST–PCR methods, supporting the assumption that intron regions are more diverged than exons. Significantly, this study demonstrates that Arabidopsis genome database can be useful in developing gene-specific PCR-based markers for other non-model plant species for which the EST data are available but genomic sequences are not. The comparative analysis of intron sizes between Rhododendron and Arabidopsis (made possible in this study by aligning of Rhododendron ESTs with Arabidopsis genomic sequences and the sequencing of Rhododendron genomic PCR products) provides the first insight into the gene structure of Rhododendron. Electronic Supplementary Material Supplementary material is available for this article at  相似文献   

11.
High-resolution analysis for population genetic and functional studies requires the use of large numbers of polymorphic markers. The recent increase of available genetic tools is facilitated by the use of publicly available expressed sequence tag (EST) sequence databases that are a valuable resource for identifying gene-linked markers. In the present study, we applied bioinformatics analyses to identify microsatellite markers present in EST sequences from a zebra finch (Taeniopgia guttata) EST database and we explore the success of cross-species amplification of EST-linked microsatellite markers in 7 passerine and 1 nonpasserine species. Eighty-six zebra finch EST-linked microsatellite loci were screened for polymorphism revealing a high amplification success rate and adequate levels of polymorphism (33.3-51%) for relatively closely related species, whereas success decreased in the most distantly related species to zebra finch. EST-linked microsatellites appear to be more highly transferable between taxa than anonymous microsatellites as they revealed higher amplification and polymorphism success between different families indicating that they will be a useful source of gene-linked polymorphic markers in a broad range of avian species.  相似文献   

12.
As a case study for single-nucleotide polymorphism (SNP) identification in species for which little or no sequence information is available, we investigated several approaches to identifying SNPs in two passerine bird species: pied and collared flycatchers (Ficedula hypoleuca and F. albicollis). All approaches were successful in identifying sequence polymorphism and over 50 candidate SNPs per species were identified from approximately 9.1 kb of sequence. In addition, 17 sites were identified in which the frequency of alternative bases differed by > 50% between species (termed interspecific SNPs). Interestingly, polymorphism of microsatellite/intron loci in the source species appeared to be a positive predictor of nucleotide diversity in homologous flycatcher sequences. The overall nucleotide diversity of flycatchers was 2.3-2.7 x 10(-3), which is approximately 3-6 times higher than observed in recent studies of human SNPs. Higher nucleotide diversity in the avian genome could be due to the relatively older age of flycatcher populations, compared with humans, and/or a higher long-term effective population size.  相似文献   

13.
Amplified fragment length polymorphism (AFLP) is often used for genetic mapping and diversity analysis, but very little information is currently available on their sequence characteristics. Species-specific sequences were analyzed from a single Coffea genome (Coffea pseudozanguebariae) associated with clustered or nonclustered AFLP loci of known genetic position. Compared with the expressed sequence tag (EST) sequence composition, their AT content exhibited a bimodal distribution with AT-poor sequences corresponding mainly to putative coding sequences. AT-rich sequences, apart from the EST distribution, were usually clustered on the genetic map and might correspond to noncoding sequences. Conversion of these AFLP markers into sequence-characterized amplified region (SCAR) anchor markers allowed us to assess sequence conservation within Coffea species with respect to species relatedness.  相似文献   

14.
Sputnik: a database platform for comparative plant genomics   总被引:10,自引:0,他引:10       下载免费PDF全文
  相似文献   

15.
Aquaporins, members of major intrinsic proteins (MIPs), transport water across cellular membranes and play vital roles in all organisms. Adversities such as drought, salinity, or chilling affect water uptake and transport, and numerous plant MIPs are reported to be differentially regulated under such stresses. However, MIP genes have been not yet been characterized in wheat, the largest cereal crop. We have identified 24 PIP and 11 TIP aquaporin genes from wheat by gene isolation and database searches. They vary extensively in lengths, numbers, and sequences of exons and introns, and sequences and cellular locations of predicted proteins, but the intron positions (if present) are characteristic. The putative PIP proteins show a high degree of conservation of signature sequences or residues for membrane integration, water transport, and regulation. The TIPs are more diverse, some with potential for water transport and others with various selectivity filters including a new combination. Most genes appear to be expressed as expressed sequence tags, while two are likely pseudogenes. Many of the genes are highly identical to rice but some are unique, and many correspond to genes that show differential expression under salinity and/or drought. The results provide extensive information for functional studies and developing markers for stress tolerance. Electronic supplementary material The online version of this article (doi:) contains supplementary material, which is available to authorized users.  相似文献   

16.
The sequencing and detailed comparative functional analysis of genomes of a number of select botanical models open new doors into comparative genomics among the angiosperms, with potential benefits for improvement of many orphan crops that feed large populations. In this study, a set of simple sequence repeat (SSR) markers was developed by mining the expressed sequence tag (EST) database of sorghum. Among the SSR-containing sequences, only those sharing considerable homology with rice genomic sequences across the lengths of the 12 rice chromosomes were selected. Thus, 600 SSR-containing sorghum EST sequences (50 homologous sequences on each of the 12 rice chromosomes) were selected, with the intention of providing coverage for corresponding homologous regions of the sorghum genome. Primer pairs were designed and polymorphism detection ability was assessed using parental pairs of two existing sorghum mapping populations. About 28% of these new markers detected polymorphism in this 4-entry panel. A subset of 55 polymorphic EST-derived SSR markers were mapped onto the existing skeleton map of a recombinant inbred population derived from cross N13 × E 36-1, which is segregating for Striga resistance and the stay-green component of terminal drought tolerance. These new EST-derived SSR markers mapped across all 10 sorghum linkage groups, mostly to regions expected based on prior knowledge of rice–sorghum synteny. The ESTs from which these markers were derived were then mapped in silico onto the aligned sorghum genome sequence, and 88% of the best hits corresponded to linkage-based positions. This study demonstrates the utility of comparative genomic information in targeted development of markers to fill gaps in linkage maps of related crop species for which sufficient genomic tools are not available.  相似文献   

17.
MOTIVATION: Accurate gene structure annotation is a challenging computational problem in genomics. The best results are achieved with spliced alignment of full-length cDNAs or multiple expressed sequence tags (ESTs) with sufficient overlap to cover the entire gene. For most species, cDNA and EST collections are far from comprehensive. We sought to overcome this bottleneck by exploring the possibility of using combined EST resources from fairly diverged species that still share a common gene space. Previous spliced alignment tools were found inadequate for this task because they rely on very high sequence similarity between the ESTs and the genomic DNA. RESULTS: We have developed a computer program, GeneSeqer, which is capable of aligning thousands of ESTs with a long genomic sequence in a reasonable amount of time. The algorithm is uniquely designed to tolerate a high percentage of mismatches and insertions or deletions in the EST relative to the genomic template. This feature allows use of non-cognate ESTs for gene structure prediction, including ESTs derived from duplicated genes and homologous genes from related species. The increased gene prediction sensitivity results in part from novel splice site prediction models that are also available as a stand-alone splice site prediction tool. We assessed GeneSeqer performance relative to a standard Arabidopsis thaliana gene set and demonstrate its utility for plant genome annotation. In particular, we propose that this method provides a timely tool for the annotation of the rice genome, using abundant ESTs from other cereals and plants. AVAILABILITY: The source code is available for download at http://bioinformatics.iastate.edu/bioinformatics2go/gs/download.html. Web servers for Arabidopsis and other plant species are accessible at http://www.plantgdb.org/cgi-bin/AtGeneSeqer.cgi and http://www.plantgdb.org/cgi-bin/GeneSeqer.cgi, respectively. For non-plant species, use http://bioinformatics.iastate.edu/cgi-bin/gs.cgi. The splice site prediction tool (SplicePredictor) is distributed with the GeneSeqer code. A SplicePredictor web server is available at http://bioinformatics.iastate.edu/cgi-bin/sp.cgi  相似文献   

18.
Comparative cross-species alternative splicing in plants   总被引:1,自引:0,他引:1       下载免费PDF全文
Alternative splicing (AS) can add significantly to genome complexity. Plants are thought to exhibit less AS than animals. An algorithm, based on expressed sequence tag (EST) pairs gapped alignment, was developed that takes advantage of the relatively small intron and exon size in plants and directly compares pairs of ESTs to search for AS. EST pairs gapped alignment was first evaluated in Arabidopsis (Arabidopsis thaliana), rice (Oryza sativa), and tomato (Solanum lycopersicum) for which annotated genome sequence is available and was shown to accurately predict splicing events. The method was then applied to 11 plant species that include 17 cultivars for which enough ESTs are available. The results show a large, 3.7-fold difference in AS rates between plant species with Arabidopsis and rice in the lower range and lettuce (Lactuca sativa) and sorghum (Sorghum bicolor) in the upper range. Hence, compared to higher animals, plants show a much greater degree of variety in their AS rates and in some plant species the rates of animal and plant AS are comparable although the distribution of AS types may differ. In eudicots but not monocots, a correlation between genome size and AS rates was detected, implying that in eudicots the mechanisms that lead to larger genomes are a driving force for the evolution of AS.  相似文献   

19.
PIP (Potential Intron Polymorphism) and SSR (Simple Sequence Repeats) were used in many species, but large-scale development and combined use of these two markers have not been reported in tobacco. In this study, a total of 12,388 PIP and 76,848 SSR markers were designed and uploaded to a webaccessible database (http://yancao.sdau.edu.cn/tgb/). E-PCR analysis showed that PIP and SSR rarely over-lapped and were strongly complementary in the tobacco genome. The density of markers was 3.07 PIP and 1.72 SSR per 10 kb of the known sequences. A total of 153 and 166 alleles were detected by 22 PIP and 22 SSR markers in 64 Nicotiana accessions. SSR produced higher PIC (polymorphism information content) values and identified more alleles than PIP, whereas PIP could identify larger numbers of rare alleles. Mantel testing demonstrated a high correlation coefficient (r = 0.949, P < 0.001) between PIP and SSR. The UPGMA dendrogram created from the combined PIP and SSR markers was clearer and more reliable than the individual PIP or SSR dendrograms. It suggested that PIP and SSR can make up the deficiency of molecular markers not only in tobacco but other plant.  相似文献   

20.
SSCP-SNP in pearl millet—a new marker system for comparative genetics   总被引:6,自引:0,他引:6  
A considerable array of genomic resources are in place in pearl millet, and marker-aided selection is already in use in the public breeding programme at ICRISAT. This paper describes experiments to extend these publicly available resources to a single nucleotide polymorphism (SNP)-based marker system. A new marker system, single-strand conformational polymorphism (SSCP)-SNP, was developed using annotated rice genomic sequences to initially predict the intron-exon borders in millet expressed sequence tags (ESTs) and then to design primers that would amplify across the introns. An adequate supply of millet ESTs was available for us to identify 299 homologues of single-copy rice genes in which the intron positions could be precisely predicted. PCR primers were then designed to amplify approximately 500-bp genomic fragments containing introns. Analysis of these fragments on SSCP gels revealed considerable polymorphism. A detailed DNA sequence analysis of variation at four of the SSCP-SNP loci over a panel of eight inbred genotypes showed complex patterns of variation, with about one SNP or indel (insertion-deletion) every 59 bp in the introns, but considerably fewer in the exons. About two-thirds of the variation was derived from SNPs and one-third from indels. Most haplotypes were detected by SSCP. As a marker system, SSCP-SNP has lower development costs than simple sequence repeats (SSRs), because much of the work is in silico, and similar deployment costs and through-put potential. The rates of polymorphism were lower but useable, with a mean PIC of 0.49 relative to 0.72 for SSRs in our eight inbred genotype panel screen. The major advantage of the system is in comparative applications. Syntenic information can be used to target SSCP-SNP markers to specific chromosomal regions or, conversely, SSCP-SNP markers can be used to unravel detailed syntenic relationships in specific parts of the genome. Finally, a preliminary analysis showed that the millet SSCP-SNP primers amplified in other cereals with a success rate of about 50%. There is also considerable potential to promote SSCP-SNP to a COS (conserved orthologous set) marker system for application across species by more specifically designing primers to precisely match the model genome sequence.Electronic Supplementary Material Supplementary material is available for this article at  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号