首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Expressed sequence tags (ESTs) are widely used in gene survey research these years. The EST Pipeline System, software developed by Hangzhou Genomics Institute (HGI), can automatically analyze different scalar EST sequences by suitable methods. All the analysis reports, including those of vector masking, sequence assembly, gene annotation, Gene Ontology classification, and some other analyses, can be browsed and searched as well as downloaded in the Excel format from the web interface, saving research efforts from routine data processing for biological rules embedded in the data.  相似文献   

2.
3.
4.
5.
SplitsTree: analyzing and visualizing evolutionary data   总被引:15,自引:0,他引:15  
MOTIVATION: Real evolutionary data often contain a number of different and sometimes conflicting phylogenetic signals, and thus do not always clearly support a unique tree. To address this problem, Bandelt and Dress (Adv. Math., 92, 47-05, 1992) developed the method of split decomposition. For ideal data, this method gives rise to a tree, whereas less ideal data are represented by a tree-like network that may indicate evidence for different and conflicting phylogenies. RESULTS: SplitsTree is an interactive program, for analyzing and visualizing evolutionary data, that implements this approach. It also supports a number of distances transformations, the computation of parsimony splits, spectral analysis and bootstrapping.   相似文献   

6.
The microarray-based analysis of gene expression has become a workhorse for biomedical research. Managing the amount and diversity of data that such experiments produce is a task that must be supported by appropriate software tools, which led to the creation of literally hundreds of systems. In consequence, choosing the right tool for a given project is difficult even for the expert. We report on the results of a survey encompassing 78 of such tools, of which 22 were inspected in detail and seven were tested hands-on. We report on our experiences with a focus on completeness of functionality, ease-of-use, and necessary effort for installation and maintenance. Thereby, our survey provides a valuable guideline for any project considering the use of a microarray data management system. It reveals which tasks are covered by mature tools and also shows that important requirements, especially in the area of integrated analysis of different experimental data, are not yet met satisfyingly by existing systems.  相似文献   

7.
Expressed sequence tags (ESTs) are generated and deposited in the public domain, as redundant, unannotated, single-pass reactions, with virtually no biological content. PipeOnline automatically analyses and transforms large collections of raw DNA-sequence data from chromatograms or FASTA files by calling the quality of bases, screening and removing vector sequences, assembling and rewriting consensus sequences of redundant input files into a unigene EST data set and finally through translation, amino acid sequence similarity searches, annotation of public databases and functional data. PipeOnline generates an annotated database, retaining the processed unigene sequence, clone/file history, alignments with similar sequences, and proposed functional classification, if available. Functional annotation is automatic and based on a novel method that relies on homology of amino acid sequence multiplicity within GenBank records. Records are examined through a function ordered browser or keyword queries with automated export of results. PipeOnline offers customization for individual projects (MyPipeOnline), automated updating and alert service. PipeOnline is available at http://stress-genomics.org.  相似文献   

8.
A recently published study(1) has identified a set of candidate genes for human diseases based on findings from Drosophila. Each human expressed sequence tag (EST) in a large database was compared with all known Drosophila genes. After eliminating matches between genes of already known function, the remaining sequences were mapped in the human genome. In each region, the phenotypes of all known human diseases were compared with the phenotypes of known Drosophila mutations in order to identify candidate genes for the human diseases. Are the correspondences real or coincidental?  相似文献   

9.
We analysed the publicly available expressed sequence tag (EST) collections for the genus Populus to examine whether evidence can be found for large-scale gene-duplication events in the evolutionary past of this genus. The ESTs were clustered into unigenes for each poplar species examined. Gene families were constructed for all proteins deduced from these unigenes, and K(S) dating was performed on all paralogs within a gene family. The fraction of paralogs was then plotted against the K(S) values, which resulted in a distribution reflecting the age of duplicated genes in poplar. Sufficient EST data were available for seven different poplar species spanning four of the six sections of the genus Populus. For all these species, there was evidence that a large-scale gene-duplication event had occurred. From our analysis it is clear that all poplar species have shared the same large-scale gene-duplication event, suggesting that this event must have occurred in the ancestor of poplar, or at least very early in the evolution of the Populus genus.  相似文献   

10.
ESTAP--an automated system for the analysis of EST data   总被引:2,自引:0,他引:2  
The EST Analysis Pipeline (ESTAP) is a set of analytical procedures that automatically verify, cleanse, store and analyze ESTs generated on high-throughput platforms. It uses a relational database to store sequence data and analysis results, which facilitates both the search for specific information and statistical analysis. ESTAP provides for easy viewing of the original and cleansed data, as well as the analysis results via a Web browser. It also allows the data owner to submit selected sequences to dbEST in a semi-automated fashion.  相似文献   

11.

Background  

The eukaryotic cell has an intricate architecture with compartments and substructures dedicated to particular biological processes. Knowing the subcellular location of proteins not only indicates how bio-processes are organized in different cellular compartments, but also contributes to unravelling the function of individual proteins. Computational localization prediction is possible based on sequence information alone, and has been successfully applied to proteins from virtually all subcellular compartments and all domains of life. However, we realized that current prediction tools do not perform well on partial protein sequences such as those inferred from Expressed Sequence Tag (EST) data, limiting the exploitation of the large and taxonomically most comprehensive body of sequence information from eukaryotes.  相似文献   

12.
Increasing numbers of phylogeographic studies make comparative inferences about the histories of co-distributed species. Although the aims of such studies are best achieved by jointly analysing sequences from multiple loci in a model-based framework, such data currently exist for few nonmodel systems. We used existing genomic data and expressed sequence tags (ESTs) for Hymenoptera and other insects to design intron-crossing primers for 40 loci, mainly ribosomal proteins, for chalcidoid parasitoids. Amplification success was scored on a range of taxa associated with two natural communities; oak galls and figs. Taxa were chosen at increasing distance from Nasonia, which was used for primer design, (i) within Pteromalids, (ii) within Chalcidoidea (Eupelmidae, Eulophidae, Eurytomidae, Ormyridae, Torymidae) and (iii) for a selection of distantly related gall and fig wasps (Cynipidae, Agaonidae). To assess the utility of these loci for phylogeographic and population genetic studies, we compared genetic diversity between Western Palaearctic refugia for two species. Our results show that it is feasible to design a large number of exon-primed-intron-crossing (EPIC) loci that may be informative about phylogeographic history within species but amplify across a large taxonomic range.  相似文献   

13.
14.
Modern molecular techniques have revealed an extraordinary diversity of microorganisms, most of which are as yet uncharacterized. This poses a major challenge to microbial ecologists: how can one compare the microbial diversity of different environments when the vast majority of microbial taxa are usually unknown? Three statistical approaches developed by ecologists and evolutionary biologists--parametric estimation, nonparametric estimation and community phylogenetics--are proving to be promising tools to meet this challenge. The combination of these tools with molecular biology techniques allow the rigorous estimation and comparison of microbial diversity in different environments.  相似文献   

15.

Background  

The omics fields promise to revolutionize our understanding of biology and biomedicine. However, their potential is compromised by the challenge to analyze the huge datasets produced. Analysis of omics data is plagued by the curse of dimensionality, resulting in imprecise estimates of model parameters and performance. Moreover, the integration of omics data with other data sources is difficult to shoehorn into classical statistical models. This has resulted in ad hoc approaches to address specific problems.  相似文献   

16.
17.
《Trends in microbiology》2023,31(7):707-722
The human microbiome is intimately related to cancer biology and plays a vital role in the efficacy of cancer treatments, including immunotherapy. Extraordinary evidence has revealed that several microbes influence tumor development through interaction with the host immune system, that is, immuno–oncology–microbiome (IOM). This review focuses on the intratumoral microbiome in IOM and describes the available data and computational methods for discovering biological insights of microbial profiling from host bulk, single-cell, and spatial sequencing data. Critical challenges in data analysis and integration are discussed. Specifically, the microorganisms associated with cancer and cancer treatment in the context of IOM are collected and integrated from the literature. Lastly, we provide our perspectives for future directions in IOM research.  相似文献   

18.
Clustering expressed sequence tags (ESTs) is a powerful strategy for gene identification, gene expression studies and identifying important genetic variations such as single nucleotide polymorphisms. To enable fast clustering of large-scale EST data, we developed PaCE (for Parallel Clustering of ESTs), a software program for EST clustering on parallel computers. In this paper, we report on the design and development of PaCE and its evaluation using Arabidopsis ESTs. The novel features of our approach include: (i) design of memory efficient algorithms to reduce the memory required to linear in the size of the input, (ii) a combination of algorithmic techniques to reduce the computational work without sacrificing the quality of clustering, and (iii) use of parallel processing to reduce run-time and facilitate clustering of larger data sets. Using a combination of these techniques, we report the clustering of 168 200 Arabidopsis ESTs in 15 min on an IBM xSeries cluster with 30 dual-processor nodes. We also clustered 327 632 rat ESTs in 47 min and 420 694 Triticum aestivum ESTs in 3 h and 15 min. We demonstrate the quality of our software using benchmark Arabidopsis EST data, and by comparing it with CAP3, a software widely used for EST assembly. Our software allows clustering of much larger EST data sets than is possible with current software. Because of its speed, it also facilitates multiple runs with different parameters, providing biologists a tool to better analyze EST sequence data. Using PaCE, we clustered EST data from 23 plant species and the results are available at the PlantGDB website.  相似文献   

19.
20.

Background  

Expressed sequence tag (EST) collections are composed of a high number of single-pass, redundant, partial sequences, which need to be processed, clustered, and annotated to remove low-quality and vector regions, eliminate redundancy and sequencing errors, and provide biologically relevant information. In order to provide a suitable way of performing the different steps in the analysis of the ESTs, flexible computation pipelines adapted to the local needs of specific EST projects have to be developed. Furthermore, EST collections must be stored in highly structured relational databases available to researchers through user-friendly interfaces which allow efficient and complex data mining, thus offering maximum capabilities for their full exploitation.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号