首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Expressed sequence tags (ESTs) represent 500-1000-bp-long sequences corresponding to mRNAs derived from different sources (cell lines, tissues, etc.). The human EST database contains over 8,000,000 sequences, with over 4,000,000,000 total nucleotides. RNA molecules are transcribed from a genomic DNA template; therefore, all ESTs should match corresponding genomes. Nevertheless, we have found in the human EST database approximately 11,000 ESTs not matching sequences in the human genome database. The presence of "trash" ESTs (TESTs) in the EST database could result from DNA or RNA contamination of the laboratory equipment, tissues, or cell lines. TESTs could also represent sequences from unidentified human genes or from species inhabiting the human body. Here, we attempt to identify the sources of human EST database contaminations. In particular, we discuss systematic contamination of the mammalian EST databases with sequences of plants.  相似文献   

2.
It is of interest to document the insights gleaned from the cDNA and EST analysis of Antrodia cinnamomea (a fungal species). Hence a library of sequences was constructed and analysed using standard procedures to gain new insights. Therefore, 65 ESTs, with size ranging from 300-2000 bp, were constructed. This included 46 ESTs with definite annotation, 18 ESTs were hypothetical and 1 new protein derived from BLAST analysis. We assigned 227 Gene Ontology terms linked to cell composition, transport, catalytic activity, and regulation functions in these sequences. Moreover, 56 matching genes were found in 8 Kyoto Encyclopedia of Genes and Genomes pathways. Data also showed 271 SSRs from Antrodia cinnamomea ESTs with an occurrence frequency of 96.82%. The STRING data analysis showed 29 genes encoded enzymes highly involved in protein-to-protein interactions linked to expression of regulation function. Thus, we documented some insights from the cDNA and EST analysis of Antrodia cinnamomea for further data mining.  相似文献   

3.
The public EST (expressed sequence tag) databases represent an enormous but heterogeneous repository of sequences, including many from a broad selection of plant species and a wide range of distinct varieties. The significant redundancy within large EST collections makes them an attractive resource for rapid pre-selection of candidate sequence polymorphisms. Here we present a strategy that allows rapid identification of candidate SNPs in barley (Hordeum vulgare L.) using publicly available EST databases. Analysis of 271,630 EST sequences from different cDNA libraries, representing 23 different barley varieties, resulted in the generation of 56,302 tentative consensus sequences. In all, 8171 of these unigene sequences are members of clusters with six or more ESTs. By applying a novel SNP detection algorithm (SNiPpER) to these sequences, we identified 3069 candidate inter-varietal SNPs. In order to verify these candidate SNPs, we selected a small subset of 63 present in 36 ESTs. Of the 63 SNPs selected, we were able to validate 54 (86%) using a direct sequencing approach. For further verification, 28 ESTs were mapped to distinct loci within the barley genome. The polymorphism information content (PIC) and nucleotide diversity () values of the SNPs identified by the SNiPpER algorithm are significantly higher than those that were obtained by random sequencing. This demonstrates the efficiency of our strategy for SNP identification and the cost-efficient development of EST-based SNP-markers.The first two authors contributed equally to this work  相似文献   

4.
L D Chaves  J A Rowe  K M Reed 《Génome》2005,48(1):12-17
Genome characterization and analysis is an imperative step in identifying and selectively breeding for improved traits of agriculturally important species. Expressed sequence tags (ESTs) represent a transcribed portion of the genome and are an effective way to identify genes within a species. Downstream applications of EST projects include DNA microarray construction and interspecies comparisons. In this study, 694 ESTs were sequenced and analyzed from a library derived from a 24-day-old turkey embryo. The 437 unique sequences identified were divided into 76 assembled contigs and 361 singletons. The majority of significant comparative matches occurred between the turkey sequences and sequences reported from the chicken. Whole genome sequence from the chicken was used to identify potential exon-intron boundaries for selected turkey clones and intron-amplifying primers were developed for sequence analysis and single nucleotide polymorphism (SNP) discovery. Identified SNPs were genotyped for linkage analysis on two turkey reference populations. This study significantly increases the number of EST sequences available for the turkey.  相似文献   

5.
The generation of large numbers of partial cDNA sequences, or expressed sequence tags (ESTs), has provided a method with which to sample a large number of genes from an organism. More than 25,000 Arabidopsis thaliana ESTs have been deposited in public databases, producing the largest collection of ESTs for any plant species. We describe here the application of a method of reducing redundancy and increasing information content in this collection by grouping overlapping ESTs representing the same gene into a "contig" or assembly. The increased information content of these assemblies allows more putative identifications to be assigned based on the results of similarity searches with nucleotide and protein databases. The results of this analysis indicate that sequence information is available for approximately 12,600 nonoverlapping ESTs from Arabidopsis. Comparison of the assemblies with 953 Arabidopsis coding sequences indicates that up to 57% of all Arabidopsis genes are represented by an EST. Clustering analysis of these sequences suggests that between 300 and 700 gene families are represented by between 700 and 2000 sequences in the EST database. A database of the assembled sequences, their putative identifications, and cellular roles is available through the World Wide Web.  相似文献   

6.
In an effort to expand the Gossypium hirsutum L. (cotton) expressed sequence tag (EST) database, ESTs representing a variety of tissues and treatments were sequenced. Assembly of these sequences with ESTs already in the EST database (dbEST, GenBank) identified 9675 cotton sequences not present in GenBank. Statistical analysis of a subset of these ESTs identified genes likely differentially expressed in stems, cotyledons, and drought-stressed tissues. Annotation of the differentially expressed cDNAs tentatively identified genes involved in lignin metabolism, starch biosynthesis and stress response, consistent with pathways likely to be active in the tissues under investigation. Simple sequence repeats (SSRs) were identified among these ESTs, and an inexpensive method was developed to screen genomic DNA for the presence of these SSRs. At least 69 SSRs potentially useful in mapping were identified. Selected amplified SSRs were isolated and sequenced. The sequences corresponded to the EST containing the SSRs, confirming that these SSRs will potentially map the gene represented by the EST. The ESTs containing SSRs were annotated to help identify the genes that may be mapped using these markers.  相似文献   

7.
8.
An expressed sequence tag (EST) approach was used to study the genome of two developmental stages of the lone star tick, Amblyomma americanum. cDNA libraries were constructed from the larval and adult stages of A. americanum. In total, 1942 ESTs were sequenced (1462 adult ESTs and 480 larval ESTs) and analyzed using bioinformatic programs. Contig assembly using the CAPII program revealed 11% and 15% redundancy of sequences in the larval and adult ESTs, respectively. Of the 1942 ESTs, 1738 sequences were considered quality sequences and of these, 771 or approximately 44.4% of the sequences were putatively identified based on amino acid identity using the protein Basic Local Alignment Search Tool (BLAST) algorithm. Putatively identified sequences were classified according to their predicted gene function. In total, 967 sequences, or 55.6% of the quality sequences, had limited or no protein similarity to previously identified gene products. Sequences lacking protein homology were analyzed using an automated sequence annotation system for predicted protein characteristics such as open reading frames, signal peptides, protein motifs, and transmembrane regions. In this paper we describe the sequencing of the largest number of ESTs obtained from an arachnid species to date and the subsequent detailed analysis of these sequences.  相似文献   

9.
A lambdaZAP Express cDNA library was constructed with mRNA obtained from immature miracidia within eggs, hatched miracidia, and sporocysts of Echinostoma paraensei. This cDNA library was amplified and 213 expressed sequence tag (EST) sequences (averaging 466 nucleotides in length) were obtained. The mean percentage of unresolved bases within the EST sequences was 0.4%, ranging from 0 to 4.6%. The 213 ESTs represent 151 unique messages. BLAST (version 2.0.8) analysis disclosed that 64 unique E. paraensei messages (42.4%) had significant similarities (BLAST score < or =e-5), at deduced amino acid or nucleotide levels, with known sequences in the nonredundant GenBank databases or the dbEST database (NCBI). The remainder, 57.6% of the unique EST-encoded messages, scored nonsignificant hits. Most of the E. paraensei messages that could be assigned a cellular role based on sequence similarities were involved in gene/protein expression. Several ESTs scored highest similarities with sequences obtained from trematode species. A total of 22,560 nucleotides present in open reading frames from ESTs that aligned with known sequences was used to determine codon usage for E. paraensei. Analysis of a subset of eight ESTs that contained full-length open reading frames did not reveal a bias in codon usage. Also, EST sequences were found to contain 3' untranslated regions with an average length of 69.9 +/- 88.4 nucleotides (n = 46). The EST sequences were submitted to GenBank/dbEST, adding to the 51 available Echinostoma-derived sequences, to provide reference information for both phylogenetic analysis and study of general trematode biology.  相似文献   

10.
Xin D  Sun J  Wang J  Jiang H  Hu G  Liu C  Chen Q 《Molecular biology reports》2012,39(9):9047-9057
Microsatellites, or simple sequence repeats (SSRs), are very useful molecular markers for a number of plant species. We used a new publicly available module (TROLL) to extract microsatellites from the public database of soybean expressed sequence tag (EST) sequences. A total of 12,833 sequences containing di- to penta-type SSRs were identified from 200,516 non-redundant soybean ESTs. On average, one SSR was found per 7.25?kb of EST sequences, with the tri-nucleotide motifs being the most abundant. Primer sequences flanking the SSR motifs were successfully designed for 9,638 soybean ESTs using the software primer3.0 and only 59 pairs of them were found in earlier studies. We synthesized 124 pairs of the primers to determine the polymorphism and heterozygosity among eight genotypes of soybean cultivars, which represented a wide range of the cultivated soybean cultivars. PCR amplification products with anticipated SSRs were obtained with 81 pairs of primers; 36 PCR products appeared to be homozygous and the remaining 45 PCR products appeared to be heterozygous and displayed polymorphism among the eight cultivars. We further analysed the EST sequences containing 45 polymorphic EST-SSR markers using the programs BLASTN and BLASTX. Sequence alignment showed that 29 ESTs have homologous sequences and 15 ESTs could be classified into a Uni-gene cluster with comparatively convincing protein products. Among these 15 ESTs belonging to a Uni-gene cluster, 9 SSRs were located in 3'-UTR, 4 SSRs were located in the intron region and 2 SSRs were located in the CDS region. None of these SSRs was located in the 5'-UTR. These novel SSRs identified in the ESTs of soybean provide useful information for gene mapping and cloning in future studies.  相似文献   

11.
12.
Simple Sequence Repeats (SSRs) developed from Expressed Sequence Tags (ESTs), known as EST-SSRs are most widely used and potentially valuable source of gene based markers for their high levels of crosstaxon portability, rapid and less expensive development. The EST sequence information in the publicly available databases is increasing in a faster rate. The emerging computational approach provides a better alternative process of development of SSR markers from the ESTs than the conventional methods. In the present study, 12,851 EST sequences of Camellia sinensis, downloaded from National Center for Biotechnology Information (NCBI) were mined for the development of Microsatellites. 6148 (4779 singletons and 1369 contigs) non redundant EST sequences were found after preprocessing and assembly of these sequences using various computational tools. Out of total 3822.68 kb sequence examined, 1636 (26.61%) EST sequences containing 2371 SSRs were detected with a density of 1 SSR/1.61 kb leading to development of 245 primer pairs. These mined EST-SSR markers will help further in the study of variability, mapping, evolutionary relationship in Camellia sinensis. In addition, these developed SSRs can also be applied for various studies across species.  相似文献   

13.
基于PC/Linux的核酸序列电子延伸系统的构建及其应用   总被引:5,自引:0,他引:5  
新基因全长cDNA序列的获得常常是分子生物学工作者面临的难题。人类基因组计划及其相关计划的实施导致了大量表达序列标签(EST)的产生。利用一定的生物信息学算法,这些EST序列往往可用来对新基因片段进行延伸。采用Linux操作系统,利用Blast软件和Phrap软件以及EST数据库在微机上构建了EST序列的电子延伸系统,并对来自于人胎肝的11386条EST序列和511条插入片段全长cDNA序列进行了电子延伸,结果显示8373条EST序列和389条插入片段全长cDNA序列得到了程度不等的延伸,部分结果通过RACE实验得到证实。该套系统可高效地、规模化进行EST序列的延伸,可为通过实验获得新基因全长cDNA序列提供重要线索。 Abstract:Normally it is difficult to obtain full-length cDNA sequence of novel genes.More and more expressed sequence tags(ESTs) have been obtained since the start-up of human genome project.Powerful system is badly needed for data mining on these EST sequences.Based on a personal computer coupled with Linux operating system and EST database,the Blast software and Phrap software were used to construct a platform for in silico elongation of ESTs in our lab.The performance was tested using 11386 EST sequences and 511 partial-length cDNA sequences.Results demonstrated that 8373 EST and 389 cDNA sequence were elongated using this system.Thus the platform seems to be a fast way for full-length cDNA sequence cloning of new genes.  相似文献   

14.
Catfishes are commercially important fish for both the fisheries and aquaculture industry. Clarias batrachus, an Indian catfish species is economically important owing to its high demand. A normalized cDNA library was constructed from spleen of the Indian catfish to identify genes associated with immune function. One thousand nine hundred thirty seven ESTs were submitted to the GenBank with an average read length of approximately 700 bp. Clustering analysis of ESTs yielded 1,698 unique sequences, including 184 contigs and 1,514 singletons. Significant homology to known genes was found by homology searches against data in GenBank in 576 (34 %) ESTs, including similarity to functionally annotated unigenes for 158 ESTs. Additionally, 433 ESTs revealed similarity to unigenes and ESTs in the dbEST but the remaining 658 EST sequences (39 %) did not match any sequence in GenBank. Of a total of 1,698 ESTs generated, 65 ESTs were found to be associated with immune functions. Gene Ontology and KEGG pathway analyses of C. batrachus ESTs collectively revealed a preponderance of immune relevant pathways apart from the presence of pathways involved in protein processing, localization, folding and protein degradation. This study constitutes first EST analysis of lymphoid organ in aquaculturally important Indian catfish species and could pave the way for further research of immune-related genes and functional genomics in this catfish.  相似文献   

15.
为了在芦笋中开发EST-SSR功能性标记,对来源于NCBI公共数据库的8590条芦笋(AsparagusofficinalisL.)EST序列进行简单重复序列SSR搜索。剔除冗余序列,得到非冗余序列8377条。在非冗余序列中共挖掘出469个EST-SSR,平均相隔14.80kb出现1个SSR。在所有的重复基序中,二核苷酸重复基序的SSR所占比例最高40.51%(190/469),其次是三核苷酸34.97%(164/469),六核苷酸21.11%(99/469)。在所有基序里,CT/AG出现的频率最高有62次,占全部重复基序的13.22%(62/469)。选取含SSR的EST序列30条,并利用primer5软件设计引物,进行SSR位点的扩增,其中27对引物扩增产物,24对有较清晰可靠的目标扩增条带,占引物数的80%,且所检测出的芦笋等位基因数量较丰富,平均4.93个/对。这些EST-SSR标记的开发将有助于芦笋群体遗传多样性、遗传图谱构建、基因定位、分子标记和系谱分析等方面的研究。  相似文献   

16.
为拓展分子标记在燕麦种质资源分析与鉴定中的应用,利用公共数据库中的25376条EST(expressed sequence tags)序列,开展了燕麦EST-SSR功能性标记的开发和利用研究。25376条EST序列经拼接去冗余后获得了11618条序列,从中筛选出含有不同重复基元的SSR且重复次数较多、长度较长的556条EST序列进行引物设计,开发了50对燕麦EST-SSR引物,通过筛选得到40对有效的EST-SSR引物。选取其中4对引物对5个燕麦种质资源进行了PCR扩增及产物测序,结果表明扩增条带多态性是由SSR差异造成的。利用40对ESTSSR引物对15个六倍体燕麦种质资源进行遗传多样性分析,共扩增出89个等位基因,平均每对引物产生2.23个等位基因;UPGMA聚类分析表明,15个六倍体燕麦种质资源在Dice系数为0.93处聚为3支,基本上是按照不同种进行聚类的,在相同种中又根据地理来源分别聚集成支。利用40对EST-SSR引物对31个遗传背景不清的燕麦种质资源进行基因组倍性鉴定,发现这些种质中可能存在有四倍体和二倍体的燕麦新资源。本研究开发的燕麦EST-SSR功能性标记将在燕麦遗传多样性分析、遗传图谱构建及燕麦属内种间基因组鉴定等方面发挥重要作用。  相似文献   

17.
For comprehensive analysis of genes expressed in the model dicotyledonous plant, Arabidopsis thaliana, expressed sequence tags (ESTs) were accumulated. Normalized and size-selected cDNA libraries were constructed from aboveground organs, flower buds, roots, green siliques and liquid-cultured seedlings, respectively, and a total of 14,026 5'-end ESTs and 39,207 3'-end ESTs were obtained. The 3'-end ESTs could be clustered into 12,028 non-redundant groups. Similarity search of the non-redundant ESTs against the public non-redundant protein database indicated that 4816 groups show similarity to genes of known function, 1864 to hypothetical genes, and the remaining 5348 are novel sequences. Gene coverage by the non-redundant ESTs was analyzed using the annotated genomic sequences of approximately 10 Mb on chromosomes 3 and 5. A total of 923 regions were hit by at least one EST, among which only 499 regions were hit by the ESTs deposited in the public database. The result indicates that the EST source generated in this project complements the EST data in the public database and facilitates new gene discovery.  相似文献   

18.
To characterize genes whose expression is induced in carbon-stress conditions, 12,969 and 13,450 5'-end expressed sequence tags (ESTs) were generated from cells grown in low-CO2 and high-CO2 conditions of the unicellular green alga, Chlamydomonas reinhardtii. These ESTs were clustered into 4436 and 3566 non-redundant EST groups, respectively. Comparison of their sequences with those of 3433 non-redundant ESTs previously generated from the cells under the standard growth condition indicated that 2665 and 1879 EST groups occurred only in the low-CO2 and high-CO2 populations, respectively. It was also noted that 96.2% and 96.0% of the cDNA species respectively obtained from the low-CO2 and high-CO2 conditions had no similar EST sequence deposited in the public databases. The EST species identified only in the low-CO2 treated cells included genes previously reported to be expressed specifically in low-CO2 acclimatized cells, suggesting that the ESTs generated in this study will be a useful source for analysis of genes related to carbon-stress acclimatization. The sequence information and search results of each clone will appear at the web site: http://www.kazusa.or.jp/en/plant/chlamy/EST/.  相似文献   

19.
Plant genomics projects involving model species and many agriculturally important crops are resulting in a rapidly increasing database of genomic and expressed DNA sequences. The publicly available collection of expressed sequence tags (ESTs) from several grass species can be used in the analysis of both structural and functional relationships in these genomes. We analyzed over 260000 EST sequences from five different cereals for their potential use in developing simple sequence repeat (SSR) markers. The frequency of SSR-containing ESTs (SSR-ESTs) in this collection varied from 1.5% for maize to 4.7% for rice. In addition, we identified several ESTs that are related to the SSR-ESTs by BLAST analysis. The SSR-ESTs and the related sequences were clustered within each species in order to reduce the redundancy and to produce a longer consensus sequence. The consensus and singleton sequences from each species were pooled and clustered to identify cross-species matches. Overall a reduction in the redundancy by 85% was observed when the resulting consensus and singleton sequences (3569) were compared to the total number of SSR-EST and related sequences analyzed (24606). This information can be useful for the development of SSR markers that can amplify across the grass genera for comparative mapping and genetics. Functional analysis may reveal their role in plant metabolism and gene evolution.  相似文献   

20.
With the advent of high-throughput sequencing technology, sequences from many genomes are being deposited to public databases at a brisk rate. Open access to large amount of expressed sequence tag (EST) data in the public databases has provided a powerful platform for simple sequence repeat (SSR) development in species where sequence information is not available. SSRs are markers of choice for their high reproducibility, abundant polymorphism and high inter-specific transferability. The mining of SSRs from ESTs requires different high-throughput computational tools that need to be executed individually which are computationally intensive and time consuming. To reduce the time lag and to streamline the cumbersome process of SSR mining from ESTs, we have developed a user-friendly, web-based EST-SSR pipeline "EST-SSR-MARKER PIPELINE (ESMP)". This pipeline integrates EST pre-processing, clustering, assembly and subsequently mining of SSRs from assembled EST sequences. The mining of SSRs from ESTs provides valuable information on the abundance of SSRs in ESTs and will facilitate the development of markers for genetic analysis and related applications such as marker-assisted breeding. AVAILABILITY: The database is available for free at http://bioinfo.aau.ac.in/ESMP.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号