首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
We performed random sequencing of cDNAs from nine biologically or industrially important cultures of the industrially valuable fungus Aspergillus oryzae to obtain expressed sequence tags (ESTs). Consequently, 21 446 raw ESTs were accumulated and subsequently assembled to 7589 non-redundant consensus sequences (contigs). Among all contigs, 5491 (72.4%) were derived from only a particular culture. These included 4735 (62.4%) singletons, i.e. lone ESTs overlapping with no others. These data showed that consideration of culture grown under various conditions as cDNA sources enabled efficient collection of ESTs. BLAST searches against the public databases showed that 2953 (38.9%) of the EST contigs showed significant similarities to deposited sequences with known functions, 793 (10.5%) were similar to hypothetical proteins, and the remaining 3843 (50.6%) showed no significant similarity to sequences in the databases. Culture-specific contigs were extracted on the basis of the EST frequency normalized by the total number for each culture condition. In addition, contig sequences were compared with sequence sets in eukaryotic orthologous groups (KOGs), and classified into the KOG functional categories.  相似文献   

2.
3.
Ustilago maydis is an important model system for the plant pathogenic smut and rust fungi. Critical to the continued development of this model is establishing genomic resources. We have constructed a cDNA library from a forced diploid culture of U. maydis growing as filaments and have generated 7455 ESTs that are assembled into 3074 contiguous sequences. This represents as much as 46% of the coding capacity predicted for U. maydis. BLAST searches with a similarity cutoff of E 相似文献   

4.
EST sequencing of Onychophora and phylogenomic analysis of Metazoa   总被引:4,自引:0,他引:4  
Onychophora (velvet worms) represent a small animal taxon considered to be related to Euarthropoda. We have obtained 1873 5' cDNA sequences (expressed sequence tags, ESTs) from the velvet worm Epiperipatus sp., which were assembled into 833 contigs. BLAST similarity searches revealed that 51.9% of the contigs had matches in the protein databases with expectation values lower than 10(-4). Most ESTs had the best hit with proteins from either Chordata or Arthropoda (approximately 40% respectively). The ESTs included sequences of 27 ribosomal proteins. The orthologous sequences from 28 other species of a broad range of phyla were obtained from the databases, including other EST projects. A concatenated amino acid alignment comprising 5021 positions was constructed, which covers 4259 positions when problematic regions were removed. Bayesian and maximum likelihood methods place Epiperipatus within the monophyletic Ecdysozoa (Onychophora, Arthropoda, Tardigrada and Nematoda), but its exact relation to the Euarthropoda remained unresolved. The "Articulata" concept was not supported. Tardigrada and Nematoda formed a well-supported monophylum, suggesting that Tardigrada are actually Cycloneuralia. In agreement with previous studies, we have demonstrated that random sequencing of cDNAs results in sequence information suitable for phylogenomic approaches to resolve metazoan relationships.  相似文献   

5.
6.
7.
8.
Mining single-nucleotide polymorphisms from hexaploid wheat ESTs.   总被引:20,自引:0,他引:20  
Single-nucleotide polymorphisms (SNPs) represent a new form of functional marker, particularly when they are derived from expressed sequence tags (ESTs). A bioinformatics strategy was developed to discover SNPs within a large wheat EST database and to demonstrate the utility of SNPs in genetic mapping and genetic diversity applications. A collection of > 90000 wheat ESTs was assembled into contiguous sequences (contigs), and 45 random contigs were then visually inspected to identify primer pairs capable of amplifying specific alleles. We estimate that homoeologue sequence variants occurred 1 in 24 bp and the frequency of SNPs between wheat genotypes was 1 SNP/540 bp (theta = 0.0069). Furthermore, we estimate that one diagnostic SNP test can be developed from every contig with 10-60 EST members. Thus, EST databases are an abundant source of SNP markers. Polymorphism information content for SNPs ranged from 0.04 to 0.50 and ESTs could be mapped into a framework of microsatellite markers using segregating populations. The results showed that SNPs in wheat can be discovered in ESTs, validated, and be applied to conventional genetic studies.  相似文献   

9.
10.
Orchids are one of the most ecological and evolutionarily significant plants, and the Orchidaceae is one of the most abundant families of the angiosperms. Genetic databases will be useful not only for gene discovery but also for future genomic annotation. For this purpose, OrchidBase was established from 37,979,342 sequence reads collected from 11 in-house Phalaenopsis orchid cDNA libraries. Among them, 41,310 expressed sequence tags (ESTs) were obtained by using Sanger sequencing, whereas 37,908,032 reads were obtained by using next-generation sequencing (NGS) including both Roche 454 and Solexa Illumina sequencers. These reads were assembled into 8,501 contigs and 76,116 singletons, resulting in 84,617 non-redundant transcribed sequences with an average length of 459 bp. The analysis pipeline of the database is an automated system written in Perl and C#, and consists of the following components: automatic pre-processing of EST reads, assembly of raw sequences, annotation of the assembled sequences and storage of the analyzed information in SQL databases. A web application was implemented with HTML and a Microsoft .NET Framework C# program for browsing and querying the database, creating dynamic web pages on the client side, analyzing gene ontology (GO) and mapping annotated enzymes to KEGG pathways. The online resources for putative annotation can be searched either by text or by using BLAST, and the results can be explored on the website and downloaded. Consequently, the establishment of OrchidBase will provide researchers with a high-quality genetic resource for data mining and facilitate efficient experimental studies on orchid biology and biotechnology. The OrchidBase database is freely available at http://lab.fhes.tn.edu.tw/est.  相似文献   

11.
12.
13.
14.
Simple Sequence Repeats (SSRs) developed from Expressed Sequence Tags (ESTs), known as EST-SSRs are most widely used and potentially valuable source of gene based markers for their high levels of crosstaxon portability, rapid and less expensive development. The EST sequence information in the publicly available databases is increasing in a faster rate. The emerging computational approach provides a better alternative process of development of SSR markers from the ESTs than the conventional methods. In the present study, 12,851 EST sequences of Camellia sinensis, downloaded from National Center for Biotechnology Information (NCBI) were mined for the development of Microsatellites. 6148 (4779 singletons and 1369 contigs) non redundant EST sequences were found after preprocessing and assembly of these sequences using various computational tools. Out of total 3822.68 kb sequence examined, 1636 (26.61%) EST sequences containing 2371 SSRs were detected with a density of 1 SSR/1.61 kb leading to development of 245 primer pairs. These mined EST-SSR markers will help further in the study of variability, mapping, evolutionary relationship in Camellia sinensis. In addition, these developed SSRs can also be applied for various studies across species.  相似文献   

15.
16.
To identify new vaccine candidates, Eimeria tenella expressed sequence tags (ESTs) from public databases were analysed for secretory molecules with an especially developed automated in silico strategy termed DNAsignalP. A total of 12,187 ESTs were clustered into 2881 contigs followed by a blastx search, which resulted in a significant number of E. tenella contigs with homologies to entries in public databases. Amino acid sequences of appropriate homologous proteins were analysed for the occurrence of an N-terminal signal sequence using the algorithm signalP. The resulting list of 84 entries comprised 51 contigs whose deduced proteins showed homologies to proteins of apicomplexan parasites. Based on function or localisation, we selected candidate proteins classified as (i) secreted proteins of Apicomplexa parasites, (ii) secreted enzymes, and (iii) transport and signalling proteins. To verify our strategy experimentally, we used a functional complementation system in yeast. For five selected candidate proteins we found that these were indeed secreted. Our approach thus represents an efficient method to identify secretory and surface proteins out of EST databases.  相似文献   

17.
The large-scale genomic resource for kelampayan was generated from a developing xylem cDNA library. A total of 6,622 high quality expressed sequence tags (ESTs) were generated through high-throughput 5’ EST sequencing of cDNA clones. The ESTs were analyzed and assembled to generate 4,728 xylogenesis unigenes distributed in 2,100 contigs and 2,628 singletons. About 59.3 % of the ESTs were assigned with putative identifications whereas 40.7 % of the sequences showed no significant similarity to any sequences in GenBank. Interestingly, most genes involved in lignin biosynthesis and several other cell wall biosynthesis genes were identified in the kelampayan EST database. The identified genes in this study will be candidates for functional genomics and association genetic studies in kelampayan aiming at the production of high value forests.  相似文献   

18.
The generation of large numbers of partial cDNA sequences, or expressed sequence tags (ESTs), has provided a method with which to sample a large number of genes from an organism. More than 25,000 Arabidopsis thaliana ESTs have been deposited in public databases, producing the largest collection of ESTs for any plant species. We describe here the application of a method of reducing redundancy and increasing information content in this collection by grouping overlapping ESTs representing the same gene into a "contig" or assembly. The increased information content of these assemblies allows more putative identifications to be assigned based on the results of similarity searches with nucleotide and protein databases. The results of this analysis indicate that sequence information is available for approximately 12,600 nonoverlapping ESTs from Arabidopsis. Comparison of the assemblies with 953 Arabidopsis coding sequences indicates that up to 57% of all Arabidopsis genes are represented by an EST. Clustering analysis of these sequences suggests that between 300 and 700 gene families are represented by between 700 and 2000 sequences in the EST database. A database of the assembled sequences, their putative identifications, and cellular roles is available through the World Wide Web.  相似文献   

19.
Rattans serve as an important source of raw non-wood materials for furniture and handicraft industries worldwide. However, their genomic sequence information in public databases is very limited. In this study, a set of 2,528 good-quality expressed sequence tags (ESTs) were generated from a full-length cDNA library constructed previously with root, stem and male inflorescence tissues of Calamus simplicifolius C. F. Wei, a rattan species native to Hainan Island, China. The ESTs were assembled into 1,588 unigenes, including 1,221 singletons and 367 contigs. BlastX searches against the GenBank non-redundant protein database revealed that 1,248 (78.6 %) unigenes had at least one significant match (E ≤ 10?5). The gene ontology functional classification assigned 991, 669 and 977 of the unigenes to the cellular component, molecular function and biological process categories, respectively. A total of 71 simple sequence repeat (SSR) loci were developed among these ESTs, including 65 polymorphic across 19 rattan species representing three genera. High levels of cross-species/genus transferability were observed for the EST-SSRs. For the polymorphic EST-SSR markers, the number of alleles per locus and polymorphic information content ranged from 2 to 25 (mean 11.1) and from 0.135 to 0.949 (mean 0.695), respectively. The EST sequences and the EST-SSR primers have been deposited in GenBank databases of EST (IDs JK838364–40891) and Probe (IDs Pr16718978–9048, to be assigned).  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号