首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.

Background  

EST sequencing is a versatile approach for rapidly gathering protein coding sequences. They provide direct access to an organism's gene repertoire bypassing the still error-prone procedure of gene prediction from genomic data. Therefore, ESTs are often the only source for biological sequence data from taxa outside mainstream interest. The widespread use of ESTs in evolutionary studies and particularly in molecular systematics studies is still hindered by the lack of efficient and reliable approaches for automated ortholog predictions in ESTs. Existing methods either depend on a known species tree or cannot cope with redundancy in EST data.  相似文献   

3.

Background  

Expressed sequence tags (ESTs) analyses are a fundamental tool for gene identification in organisms. Given a preliminary EST sample from a certain library, several statistical prediction problems arise. In particular, it is of interest to estimate how many new genes can be detected in a future EST sample of given size and also to determine the gene discovery rate: these estimates represent the basis for deciding whether to proceed sequencing the library and, in case of a positive decision, a guideline for selecting the size of the new sample. Such information is also useful for establishing sequencing efficiency in experimental design and for measuring the degree of redundancy of an EST library.  相似文献   

4.
5.
Maiti AK  Jorissen M  Bouvagnet P 《Genome biology》2001,2(7):research0026.1-research00269

Background

Immotile cilia syndrome (ICS) or primary ciliary dyskinesia (PCD) is an autosomal recessive disorder in humans in which the beating of cilia and sperm flagella is impaired. Ciliated epithelial cell linings are present in many tissues. To understand ciliary assembly and motility, it is important to isolate those genes involved in the process.

Results

Total RNA was isolated from cultured ciliated nasal epithelial cells after in vitro ciliogenesis and expressed sequenced tags (ESTs) were generated. The functions and locations of 63 of these ESTs were derived by BLAST from two public databases. These ESTs are grouped into various classes. One group has high homology not only with the mitochondrial genome but also with one or more chromosomal DNAs, suggesting that very similar genes, or genes with very similar domains, are expressed from both mitochondrial and nuclear DNA. A second class comprises genes with complete homology with part of a known gene, suggesting that they are the same genes. A third group has partial homology with domains of known genes. A fourth group, constituting 33% of the ESTs characterized, has no significant homology with any gene or EST in the database.

Conclusions

We have shown that sufficient information about the location of ESTs could be derived electronically from the recently completed human genome sequences. This strategy of EST localization should be significantly useful for mapping and identification of new genes in the forthcoming human genome sequences with the vast number of ESTs in the dbEST database.  相似文献   

6.
7.

Background

A large number of gene prediction programs for the human genome exist. These annotation tools use a variety of methods and data sources. In the recent ENCODE genome annotation assessment project (EGASP), some of the most commonly used and recently developed gene-prediction programs were systematically evaluated and compared on test data from the human genome. AUGUSTUS was among the tools that were tested in this project.

Results

AUGUSTUS can be used as an ab initio program, that is, as a program that uses only one single genomic sequence as input information. In addition, it is able to combine information from the genomic sequence under study with external hints from various sources of information. For EGASP, we used genomic sequence alignments as well as alignments to expressed sequence tags (ESTs) and protein sequences as additional sources of information. Within the category of ab initio programs AUGUSTUS predicted significantly more genes correctly than any other ab initio program. At the same time it predicted the smallest number of false positive genes and the smallest number of false positive exons among all ab initio programs. The accuracy of AUGUSTUS could be further improved when additional extrinsic data, such as alignments to EST, protein and/or genomic sequences, was taken into account.

Conclusion

AUGUSTUS turned out to be the most accurate ab initio gene finder among the tested tools. Moreover it is very flexible because it can take information from several sources simultaneously into consideration.
  相似文献   

8.
9.
10.
11.
In the past years, identification of alternative splicing (AS) variants has been gaining momentum. We developed AVATAR, a database for documenting AS using 5,469,433 human EST sequences and 26,159 human mRNA sequences. AVATAR contains 12000 alternative splicing sites identified by mapping ESTs and mRNAs with the whole human genome sequence. AVATAR also contains AS information for 6 eukaryotes. We mapped EST alignment information into a graph model where exons and introns are represented with vertices and edges, respectively. AVATAR can be queried using, (1) gene names, (2) number of identified AS events in a gene, (3) minimal number of ESTs supporting a splicing site, etc. as search parameters. The system provides visualized AS information for queried genes.

Availability  相似文献   


12.

Background  

There is no dedicated database available for Expressed Sequence Tags (EST) of the chili pepper (Capsicum annuum), although the interest in a chili pepper EST database is increasing internationally due to the nutritional, economic, and pharmaceutical value of the plant. Recent advances in high-throughput sequencing of the ESTs of chili pepper cv. Bukang have produced hundreds of thousands of complementary DNA (cDNA) sequences. Therefore, a chili pepper EST database was designed and constructed to enable comprehensive analysis of chili pepper gene expression in response to biotic and abiotic stresses.  相似文献   

13.

Background

Infection of plants by pathogens and the subsequent disease development involves substantial changes in the biochemistry and physiology of both partners. Analysis of genes that are expressed during these interactions represents a powerful strategy to obtain insights into the molecular events underlying these changes. We have employed expressed sequence tag (EST) analysis to identify rice genes involved in defense responses against infection by the blast fungus Magnaporthe oryzae and fungal genes involved in infectious growth within the host during a compatible interaction.

Results

A cDNA library was constructed with RNA from rice leaves (Oryza sativa cv. Hwacheong) infected with M. oryzae strain KJ201. To enrich for fungal genes, subtraction library using PCR-based suppression subtractive hybridization was constructed with RNA from infected rice leaves as a tester and that from uninfected rice leaves as the driver. A total of 4,148 clones from two libraries were sequenced to generate 2,302 non-redundant ESTs. Of these, 712 and 1,562 ESTs could be identified to encode fungal and rice genes, respectively. To predict gene function, Gene Ontology (GO) analysis was applied, with 31% and 32% of rice and fungal ESTs being assigned to GO terms, respectively. One hundred uniESTs were found to be specific to fungal infection EST. More than 80 full-length fungal cDNA sequences were used to validate ab initio annotated gene model of M. oryzae genome sequence.

Conclusion

This study shows the power of ESTs to refine genome annotation and functional characterization. Results of this work have advanced our understanding of the molecular mechanisms underpinning fungal-plant interactions and formed the basis for new hypothesis.  相似文献   

14.
15.

Background  

Research involving expressed sequence tags (ESTs) is intricately coupled to the existence of large, well-annotated sequence repositories. Comparatively complete and satisfactory annotated public sequence libraries are, however, available only for a limited range of organisms, rendering the absence of sequences and gene structure information a tangible problem for those working with taxa lacking an EST or genome sequencing project. Paralogous genes belonging to the same gene family but distinguished by derived characteristics are particularly prone to misidentification and erroneous annotation; high but incomplete levels of sequence similarity are typically difficult to interpret and have formed the basis of many unsubstantiated assumptions of orthology.  相似文献   

16.
17.

Background

Gene prediction is a challenging but crucial part in most genome analysis pipelines. Various methods have evolved that predict genes ab initio on reference sequences or evidence based with the help of additional information, such as RNA-Seq reads or EST libraries. However, none of these strategies is bias-free and one method alone does not necessarily provide a complete set of accurate predictions.

Results

We present IPred (Integrative gene Prediction), a method to integrate ab initio and evidence based gene identifications to complement the advantages of different prediction strategies. IPred builds on the output of gene finders and generates a new combined set of gene identifications, representing the integrated evidence of the single method predictions.

Conclusion

We evaluate IPred in simulations and real data experiments on Escherichia Coli and human data. We show that IPred improves the prediction accuracy in comparison to single method predictions and to existing methods for prediction combination.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-015-1315-9) contains supplementary material, which is available to authorized users.  相似文献   

18.

Background

Most studies inferring species phylogenies use sequences from single copy genes or sets of orthologs culled from gene families. For taxa such as plants, with very high levels of gene duplication in their nuclear genomes, this has limited the exploitation of nuclear sequences for phylogenetic studies, such as those available in large EST libraries. One rarely used method of inference, gene tree parsimony, can infer species trees from gene families undergoing duplication and loss, but its performance has not been evaluated at a phylogenomic scale for EST data in plants.

Results

A gene tree parsimony analysis based on EST data was undertaken for six angiosperm model species and Pinus, an outgroup. Although a large fraction of the tentative consensus sequences obtained from the TIGR database of ESTs was assembled into homologous clusters too small to be phylogenetically informative, some 557 clusters contained promising levels of information. Based on maximum likelihood estimates of the gene trees obtained from these clusters, gene tree parsimony correctly inferred the accepted species tree with strong statistical support. A slight variant of this species tree was obtained when maximum parsimony was used to infer the individual gene trees instead.

Conclusion

Despite the complexity of the EST data and the relatively small fraction eventually used in inferring a species tree, the gene tree parsimony method performed well in the face of very high apparent rates of duplication.
  相似文献   

19.

Background

Plants growing in their natural habitat represent a valuable resource for elucidating mechanisms of acclimation to environmental constraints. Populus euphratica is a salt-tolerant tree species growing in saline semi-arid areas. To identify genes involved in abiotic stress responses under natural conditions we constructed several normalized and subtracted cDNA libraries from control, stress-exposed and desert-grown P. euphratica trees. In addition, we identified several metabolites in desert-grown P. euphratica trees.

Results

About 14,000 expressed sequence tag (EST) sequences were obtained with a good representation of genes putatively involved in resistance and tolerance to salt and other abiotic stresses. A P. euphratica DNA microarray with a uni-gene set of ESTs representing approximately 6,340 different genes was constructed. The microarray was used to study gene expression in adult P. euphratica trees growing in the desert canyon of Ein Avdat in Israel. In parallel, 22 selected metabolites were profiled in the same trees.

Conclusion

Of the obtained ESTs, 98% were found in the sequenced P. trichocarpa genome and 74% in other Populus EST collections. This implies that the P. euphratica genome does not contain different genes per se, but that regulation of gene expression might be different and that P. euphratica expresses a different set of genes that contribute to adaptation to saline growth conditions. Also, all of the five measured amino acids show increased levels in trees growing in the more saline soil.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号